- Jul 2024
-
www.biorxiv.org www.biorxiv.org
-
Reviewer #3 (Public Review):
Summary:
It has been demonstrated that cardiac lymphatics are essential for cardiac health and function. Moreover, post-myocardial infarction, targeting lymphatics by stimulating lymphangiogenesis has been shown to improve cardiac inflammation, fibrosis, and function. Then, the aim of this study was to evaluate the transcriptomic changes of cardiac lymphatic endothelial cells (LECs) after a myocardial infarction, which could reveal new therapeutic targets targeting lymphatic function. Moreover, investigating the cell-cell communication between lymphatic and immune cells would give critical information for a better understanding of the disease.
Strengths:
The use of scRNAseq data to evaluate LECs is an effective strategy considering the small proportion of LECs compared to blood endothelial cells. The extensive bioinformatic analysis used by the authors for three different data sets.
Weaknesses:
Among a total of 44,860 cells, only 242 LECs and 5,688 endothelial cells were identified. This small number of LECs is not representative and is insufficient to reliably distinguish four different clusters. The bioinformatic analysis is not supported by significant results in their in vivo and in vitro experiments.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This useful study presents a real-time transcriptomics analysis, with the aim of providing rapid access to sequenced data to reduce the costs associated with Oxford Nanopore long-read technology. Although the authors illustrate the compelling utility of this approach with three diverse experimental setups, issues with study design and analysis result in incomplete supporting evidence.
-
Reviewer #1 (Public Review):
Summary:
In this study, the authors developed three case studies: (1) transcriptome profiling of two human cell cultures (HEK293 and HeLa), (2) identification of experimentally enriched transcripts in cell culture (RiboMinus and RiboPlus treatments), and (3) identification of experimentally manipulated genes in yeast strains (gene knockouts or strains transformed with plasmids containing the deleted gene for overexpression). Sequencing was performed using the Oxford Nanopore Technologies (ONT), the only technology that allows for real-time analysis. The real-time transcriptomic analysis was performed using NanopoReaTA, a recent toolbox for comparative transcriptional analyses of Nanopore-seq data, developed by the group (Wierczeiko and Pastore et al. 2023). The authors aimed to show the use of the tool developed by them in data generated by ONT, evidencing the versatility of the tool and the possibility of cost reduction since the sequencing by ONT can be stopped at any time since enough data were collected.
Strengths:
Given that Oxford Nanopore Technologies offers real-time sequencing, it is extremely useful to develop tools that allow real-time data analysis in parallel with data generation. The authors demonstrated that this strategy is possible for both human cell lines and yeasts in the case studies presented. It is a useful strategy for the scientific community and it has the potential to be integrated into clinical applications for rapid and cost-effective quality checks in specific experiments such as overexpression of genes.
Weaknesses:
In relation to the RNA-Seq analyses, for a proper statistical analysis, a greater number of replicates should have been performed. The experiments were conducted with a minimal number of replicates (2 replicates for case study 1 and 2 and 3 replicates for case study 3).
Regarding the experimental part, some problems were observed in the conversion to double-stranded and loading for Nanopore-Seq, which were detailed in Supplementary Material 2. This fact is probably reflected in the results where a reduction in the overall sequencing throughput and detected gene number for HEK293 compared to HeLa were observed (data presented in Supplementary Figure 2). It is necessary to use similar quantities of RNA/cDNA since the sequencing occurs in real-time. The authors should have standardized the experimental conditions to proceed with the sequencing and perform the analyses.
-
Reviewer #2 (Public Review):
Summary:
Transcriptomics technologies play important roles in biological studies. Technologies based on second-generation sequencing, such as mRNA-seq, face some serious obstacles, including isoform analysis, due to short read length. Third-generation sequencing technologies perfectly solve these problems by having long reads, but they are much more expensive. The authors presented a useful real-time strategy to minimize the cost of sequencing with Oxford Nanopore Technologies (ONT). The authors performed three sets of experiments to illustrate the utility of the real-time strategy. However, due to the problems in experimental design and analysis, their aims are not completely achieved. If the authors can significantly improve the experiments and analysis, the strategy they proposed will guide biologists to conduct transcriptomics studies with ONT in a fast and cost-effective way and help studies in both basic research and clinical applications.
Strengths:
The authors have recently developed a computational tool called NanopoReaTA to perform real-time analysis when cDNA/RNA samples are sequenced with ONT (Wierczeiko et al., 2023). The advantage of real-time analysis is that the sequencing can be stopped once enough data is collected to save cost. Here, they described three sets of experiments: a comparison between two human cell lines, a comparison among RNA preparation procedures, and a comparison between genetically modified yeasts. Their results show that the real-time strategy works for different species and different RNA preparation methods.
Weaknesses:
However, especially considering that the computational tool NanopoReaTA is their previous work, the authors should present more helpful guidelines to perform real-time ONT analysis and more advanced analysis methods. There are four major weaknesses:
(1) For all three sets of experiments, the authors focused on sample clustering and gene-level differential expression analysis (DEA), and only did little analysis on isoform level and even nothing in any figures in the main text. Sample clustering and gene-level DEA can be easily and well done using mRNA-seq at a much cheaper cost. Even for initial data quality checking, mRNA-seq can be first done in Illumina MiSeq/NextSeq which is quick, before deep sequencing in HiSeq/NovaSeq. The real power of third-generation RNA sequencing is the isoform analysis due to the long read length. At least for now, PacBio Iso-seq is very expensive and one cannot analyze the data in real-time. Thus, the authors should focus on the real-time isoform analysis of ONT to show the advantages.
(2) The sample sizes are too small in all three sets of experiments: only two for sets 1 and 2, and three for set 3. For DEA, three is the minimal number for proper statistics. But a sample size of three always leads to very poor power. Nowadays, a proper transcriptomics study usually has a larger sample size. Besides the power issue, biological samples always contain many outliers due to many reasons. It is crucial to show whether the real-time analysis also works for larger sample sizes, such as 10, i.e., 20 samples in total. Will the performance still hold when the sample number is increasing? What is the maximum sample number for an ONT run? If the samples need to be split into multiple runs, how the real-time analysis will be adjusted? These questions are quite useful for researchers who plan to use ONT.
(3) According to the manuscript, real-time analysis checks the sequencing data in a few time points, this is usually called sequential analysis or interim analysis in statistics which is usually performed in clinical trials to save cost. Care must be taken while performing these analyses, as repeated checks on the data can inflate the type I error rate. Thus, the authors should develop a sequential analysis procedure for real-time RNA sequencing.
(4) The experimental set 1 (comparison between two completely different human cell lines) and experimental set 2 (comparison among RNA preparation procedures) are not quite biologically meaningful. If it is possible, it is better for the authors to perform an experiment more similar to a real situation for biological discovery. Then the manuscript can attract more researchers to follow its guidelines.
-
-
www.biorxiv.org www.biorxiv.org
-
Author response:
Reviewer #1 (Public Review):
Weaknesses:
The weakness of this study lies in the fact that many of the genomic datasets originated from novel methods that were not validated with orthogonal approaches, such as DNA-FISH. Therefore, the detailed correlations described in this work are based on methodologies whose efficacy is not clearly established. Specifically, the authors utilized two modified protocols of TSA-seq for the detection of NADs (MKI67IP TSA-seq) and LADs (LMNB1-TSA-seq). Although these methods have been described in a bioRxiv manuscript by Kumar et al., they have not yet been published. Moreover, and surprisingly, Kumar et al., work is not cited in the current manuscript, despite its use of all TSA-seq data for NADs and LADs across the four cell lines. Moreover, Kumar et al. did not provide any DNA-FISH validation for their methods. Therefore, the interesting correlations described in this work are not based on robust technologies.
An attempt to validate the data was made for SON-TSA-seq of human foreskin fibroblasts (HFF) using multiplexed FISH data from IMR90 fibroblasts (from the lung) by the Zhuang lab (Su et al., 2020). However, the comparability of these datasets is questionable. It might have been more reasonable for the authors to conduct their analyses in IMR90 cells, thereby allowing them to utilize MERFISH data for validating the TSA-seq method and also for mapping NADs and LADs.
We disagree with the statement that the TSA-seq approach and data has not been validated by orthogonal approaches and with the conclusion that the TSA-seq approach is not robust as summarized here and detailed below in “Specific Comments”. TSA-seq is robust because it is based only on the original immunostaining specificity provided by the primary and secondary antibodies plus the diffusion properties of the tyramide-free radical. TSA-seq has been extensively validated by microscopy and by the orthogonal genomic measurements provided by LMNB1 DamID and NAD-seq. This includes: a) the initial validation by FISH of both nuclear speckle (to an accuracy of ~50 nm) and nuclear lamina TSA-seq and the cross-validation of nuclear lamina TSA-seq with lamin B1 DamID in a first publication (Chen et al, JCB 2018, doi: 10.1083/jcb.201807108); b) the further validation of SON TSA-seq by FISH in a second publication ((Zhang et al, Genome Research 2021, doi:10.1101/gr.266239.120); c) the cross-validation of nucleolar TSA-seq using NAD-seq and the validation by light microscopy of the predictions of differences in the relative distributions of centromeres, nuclear speckles, and nucleoli made from nuclear speckle, nucleolar, and pericentric heterochromatin TSA-seq in the Kumar et al, bioRxiv preprint (which is in a last revision stage involving additional formatting for the journal requirements) doi:https://doi.org/10.1101/2023.10.29.564613; d) the extensive validation of nuclear speckle, LMNB1, and nucleolar TSA-seq generated in HFF human fibroblasts using published light microscopy distance measurements of hundreds of probes generated by multiplexed immuno-FISH MERFISH data (Su et al, Cell 2020, https://doi.org/10.1016/j.cell.2020.07.032), as we described for nucleolar TSA-seq in the Kumar et al, bioRxiv preprint and to some extent for LMNB1 and SON TSA-seq in the current manuscript version (see Specific Comments with attached Author response image 2).
Reviewer 1 raised concerns regarding this FISH validation given that the HFF TSA-seq and DamID data was compared to IMR90 MERFISH measurements. The Su et al, Cell 2020 MERFISH paper came out well after the 4D Nucleome Consortium settled on HFF as one of the two main “Tier 1” cell lines. We reasoned that the nuclear genome organization in a second fibroblast cell line would be sufficiently similar to justify using IMR90 FISH data as a proxy for our analysis of our HFF data. Indeed, there is a high correlation between the HFF TSA-seq and distances measured by MERFISH to nuclear lamina, nucleoli, and nuclear speckles (Author response image 1). Comparing HFF SON-TSA-seq data with published IMR90 SON TSA-seq data (Alexander et al, Mol Cell 2021, doi.org/10.1016/j.molcel.2021.03.006), the HFF SON TSA-seq versus MERFISH scatterplot is very similar to the IMR90 SON TSA-seq versus MERFISH scatterplot. We acknowledge the validation provided by the IMR90 MERFISH is limited by the degree to which genome organization relative to nuclear locales is similar in IMR90 and HFF fibroblasts. However, the correlation between measured microscopic distances from nuclear lamina, nucleoli, and nuclear speckles and TSA-seq scores is already quite high. We anticipate the conclusions drawn from such comparisons are solid and will only become that much stronger with future comparisons within the same cell line.
Author response image 1.
Scatterplots showing the correlation between TSA-seq and MERFISH microscopic distances. Top: IMR90 SON TSA-seq (from Alexander et al, Mol Cell 2021) (left) and HFF SON TSA-seq (right) (x-axis) versus distance to nuclear speckles (y-axis). Bottom: HFF Lamin B1 TSA-seq (x-axis) versus distance to nuclear lamina (y-axis) (left) and HFF MKI67IP (nucleolar) TSA-seq (x-axis) versus distance to nucleolus (y-axis) (right).
In our revision, we will add justification of the use of IMR90 fibroblasts as a proxy for HFF fibroblasts through comparison of available data sets.
Reviewer #2 (Public Review):
Weaknesses:
The experiments are largely descriptive, and it is difficult to draw many cause-and-effect relationships. Similarly, the paper would be very much strengthened if the authors provided additional summary statements and interpretation of their results (especially for those not as familiar with 3D genome organization). The study would benefit from a clear and specific hypothesis.
We acknowledge that this study was hypothesis-generating rather than hypothesis-testing in its goal. This research was funded through the NIH 4D-Nucleome Consortium, which had as its initial goal the development, benchmarking, and validation of new genomic technologies. Our Center focused on the mapping of the genome relative to different nuclear locales and the correlation of this intranuclear positioning of the genome with functions- specifically gene expression and DNA replication timing. By its very nature, this project has taken a discovery-driven versus hypothesis-driven scientific approach. Our question fundamentally was whether we could gain new insights into nuclear genome organization through the integration of genomic and microscopic measurements of chromosome positioning relative to multiple different nuclear compartments/bodies and their correlation with functional assays such as RNA-seq and Repli-seq.
Indeed, as described in this manuscript, this study resulted in multiple new insights into nuclear genome organization as summarized in our last main figure. We believe our work and conclusions will be of general interest to scientists working in the fields of 3D genome organization and nuclear cell biology. We anticipate that each of these new insights will prompt future hypothesis-driven science focused on specific questions and the testing of cause-and-effect relationships.
Given the extensive scope of this manuscript, we were limited in the extent that we could describe and summarize the background, data, analysis, and significance for every new insight. In our editing to reach the eLife recommended word count, we removed some of the explanations and summaries that we had originally included.
As suggested by Reviewer 2, in our revision we will add back additional summary and interpretation statements to help readers unfamiliar with 3D genome organization.
Specific Comments in response to Reviewer 1:
(1) We disagree with the comment that TSA-seq has not been cross-validated by other orthogonal genomic methods. In the first TSA-seq paper (Chen et al, JCB 2018, doi: 10.1083/jcb.201807108), we showed a good correlation between the identification of iLADs and LADs by nuclear lamin and nuclear speckle TSA-seq and the orthogonal genomic method of lamin B1 DamID, which is reproduced using our new TSA-seq 2.0 protocol in this manuscript. Similarly, in the Kumar et al, bioRxiv preprint (doi:https://doi.org/10.1101/2023.10.29.564613), we showed a general agreement between the identification of NADs by nucleolar TSA-seq and the orthogonal genomic method of NAD-seq. (We expect this preprint to be in press soon; it is now undergoing a last revision involving only reformatting for journal requirements.) Additionally, we also showed a high correlation between Hi-C compartments and subcompartments and TSA-seq in the Chen et al, JCB 2018 paper. Specifically, there is an excellent correlation between the A1 Hi-C subcompartment and Speckle Associated Domains as detected by nuclear speckle TSA-seq. Additionally, the A2 Hi-C subcompartment correlated well with iLAD regions with intermediate nuclear speckle TSA-seq scores, and the B2 and B3 Hi-C subcompartments with LADs detected by both LMNB TSA-seq and LMNB1 DamID. More generally, Hi-C A and B compartment identity correlated well with predictions of iLADs versus LADs from nuclear speckle and nuclear lamina TSA-seq.
(2) In the Chen et al, JCB 2018 paper we also qualitatively and quantitatively validated TSA-seq using FISH. Qualitatively, we showed that both nuclear speckle and nuclear lamin TSA-seq correlated well with distances to nuclear speckles versus the nuclear lamina, respectively, measured by immuno-FISH.
Quantitatively, we showed that SON TSA-seq could be used to estimate the microscopic mean distance to nuclear speckles with mean and median residuals of ~50 nm. First, we used light microscopy to show that the spreading of tyramide-biotin signal from a point-source of TSA staining fits well with the exponential decay predicted theoretically by reaction-diffusion equations assuming a steady rate of tyramide-biotin free radical generation by the HRP enzyme and a constant probability throughout the nucleus of free-radical quenching (through reaction with protein tyrosine residues and nucleic acids). Second, we used the exponential decay constant measured by light microscopy together with FISH measurements of mean speckle distance for several genomic regions to fit an exponential function and to predict distance to nuclear speckles genome-wide directly from SON TSA-seq sequencing reads. Third, we used this approach to test the predictions against a new set of FISH measurements, demonstrating an accuracy of these predictions of ~50 nm.
(3) The importance of the quantitative validation by immuno-FISH of using TSA-seq to estimate mean distance to nuclear speckles is that it demonstrates the robustness of the TSA-seq approach. Specifically, it shows how the TSA-seq signal is predicted to depend only on the specificity of the primary and secondary antibody staining and the diffusion properties of the tyramide-biotin free radicals produced by the HRP peroxidase. This is fundamentally different from the significant dependence on antibodies and choice of marker proteins for molecular proximity assays such as DamID, ChIP-seq, and Cut and Run/Tag which depend on molecular proximity for labeling and/or pulldown of DNA.
This robustness leads to specific predictions. First, it predicts similar TSA-seq signals will be produced using antibodies against different marker proteins against the same nuclear compartment. This is because the exponential decay constant (distance at which the signal drops by one half) for the spreading of the TSA is in the range of several hundred nm, as measured by light microscopy for several TSA staining conditions. Indeed, we showed in the Chen et al, JCB 2018 paper that antibodies against two different nuclear speckle proteins produced very similar TSA-seq signals while antibodies against LMNB versus LMNA also produced very similar TSA-seq signals. Similarly, we showed in the Kumar et al preprint that antibodies against four different nucleolar proteins showed similar TSA-seq signals, with the highest correlation coefficients for the TSA-seq signals produced by the antibodies against two GC nucleolar marker proteins and the TSA-seq signals produced by the antibodies against two FC/DFC nucleolar marker proteins.
Author response image 2.
Comparison of TSA-seq data from different cell lines versus IMR90 MERFISH. The observed correlation between SON (nuclear speckle) TSA-seq versus MERFISH is nearly as high for TSA-seq data from HFF as it is for TSA-seq data from the IMR90 cell line (Alexander et al, Mol Cell 2021) in which the MERFISH was performed. The correlations for SON, LMNB1 (nuclear lamina) and MKI67IP (nucleolus) versus MERFISH are highest for HFF TSA-seq data as compared to TSA-seq data from other cell lines (H1, K562, HCT116). Comparison of measured distances to nuclear locale (y-axis) versus TSA-seq scores (x-axis) from different cell lines labeled in red. Left to right: SON, LMNB1, and MKI67IP. Top to bottom: SON TSA-seq versus MERFISH for two TSA-seq replicates; TSA-seq from HFF, H1, K562, and HCT116 versus MERFISH.
Second, it predicts that the quantitative relationship between TSA-seq signal and mean distance from a nuclear compartment will depend on the convolution of the predicted exponential decay of spreading of the TSA signal produced by a point source with the more complicated staining distribution of nuclear compartments such as the nuclear lamina or nucleoli. We successfully used this concept to explain the differences emerging between LMNB1 DamID and TSA-seq signals for flat nuclei and to recognize the polarized distribution of different LADs over the nuclear periphery.
(4) After our genomic data production and during our data analysis, a valuable resource from the Zhuang lab was published, using MERFISH to visualize hundreds of genomic loci in IMR90 cells. We acknowledge that the much more extensive validation of TSA-seq by the multiplexed immuno-FISH MERFISH data is dependent on the degree to which the nuclear genome organization is similar between IMR90 and HFF fibroblasts. However, the correlation between distances to nuclear speckles, nucleoli, and the nuclear lamina measured in IMR90 fibroblasts and the nuclear speckle, nucleolar, and nuclear lamina TSA-seq measured in HFF fibroblasts is already striking (See Author response image 1). With regard to SON TSA-seq, the MERFISH versus HFF TSA-seq correlation is close to what we observe using published IMR90 SON TSA-seq data (correlation coefficients of 0.89 (IMR90 TSA-seq) versus 0.86 (HFF TSA-seq). Moreover, this correlation is highest using TSA-seq data from HFF cells as compared to the three other cell lines. (see Author response image 2). We believe these correlations can be considered a lower bound on the actual correlations between the FISH distances and TSA-seq that we would have observed if we had performed both assays on the same cell line.
(5) Currently, we still require tens of millions of cells to perform each TSA-seq assay. This requires significant expansion of cells and a resulting increase in passage numbers of the IMR90 cells before we can perform the TSA-seq. During this expansion we observe a noticeable slowing of the IMR90 cell growth as expected for secondary cell lines as we approach the Hayflick limit. We still do not know to what degree nuclear organization relative to nuclear locales may change as a function of cell cycle composition (ie percentage of cycling versus quiescent cells) and cell age. Thus, even if we performed TSA-seq on IMR90 cells we would be comparing MERFISH from lower passages with a higher percentage of actively proliferating cells with TSA-seq from higher passages with a higher percentage of quiescent cells.
We are currently working on a new TSA-seq protocol that will work with thousands of cells. We believe it is better investment of time and resources to wait until this new protocol is optimized before we repeat TSA-seq in IMR90 cells for a better comparison with multiplexed FISH data.
Specific Comments in response to Reviewer 2:
(1) As we acknowledge in our Response summary, we were limited in the degree to which we could actually follow-up our findings with experiments designed to test specific hypotheses generated by our data. However, we do want to point out that our comparison of wild-type K562 cells with the LMNA/LBR double knockout was designed to test the long-standing model that nuclear lamina association of genomic loci contributes to gene silencing. This experiment was motivated by our surprising result that gene expression differences between cell lines correlated strongly with differences in positioning relative to nuclear speckles rather than the nuclear lamina. Despite documenting in these double knockout cells a decreased nuclear lamina association of most LADs, and an increased nuclear lamina association of the “p-w-v” fiLADs identified in this manuscript, we saw no significant change in gene expression in any of these regions as compared to wild-type K562 cells. Meanwhile, distances to nuclear speckles as measured by TSA-seq remained nearly constant.
We would argue that this represents a specific example in which new insights generated by our genomics comparison of cell lines led to a clear and specific hypothesis and the experimental testing of this hypothesis.
In response to Reviewer 2, we are modifying the text to make this clearer and to explicitly describe how we were testing the hypothesis that distance to nuclear lamina is correlated with but not causally linked to gene expression and how to test this hypothesis we used a DKO of LMNA and LBR to change distances relative to the nuclear lamina and to test the effect on gene expression.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This study develops a useful metric for quantifying codon usage adaptation - the Codon Adaptation Index of Species (CAIS). This metric permits direct comparisons of the strength of selection at the molecular level across species. The study is based on solid evidence, and the authors identify relationships between CAIS and the presence of disordered protein domains. Other correlations, such as the one between CAIS and body size, are weak and non-significant. In summary, the study introduces an interesting new approach to quantifying codon usage across species, which may be helpful in attempts to measure selection at the molecular level.
-
Reviewer #2 (Public Review):
Assessment
This study develops a potentially useful metric for quantifying codon usage adaptation – the Codon Adaptation Index of Species (CAIS) – that is intended to allow for more direct comparisons of the strength of selection at the molecular level across species by controlling for interspecies variation in amino acid usage and GC content. As evidence to support there claim CAIS better controls for GC content and amino acid usage across species, they note that CAIS has only a weak positive correlation with GC% (that does not stand up to multiple hypothesis testing correction) while CAI has a clear negative correlation with GC%. Using CAIS, they find better adapted species have more disordered protein domains; however, excitement about these findings is dampened due to (1) this result is also observed using the effective number of codons (ENC) and
(2) concerns over the interpretation of CAIS as a proxy for the effectiveness of selection.
Public Review
Summary
The goal of the authors in this study is to develop a more reliable approach for quantifying codon usage such that it is more comparable across species. Specifically, the authors wish to estimate the degree of adaptive codon usage, which is potentially a general proxy for the strength of selection at the molecular level. To this end, the authors created the Codon Adaptation Index for Species (CAIS) that attempts to control for differences in amino acid usage and GC% across species. Using their new metric, the authors observe a positive relationship between CAIS and the overall “disorderedness” of a species protein domains. I think CAIS has the potential to be a valuable tool for those interested in comparing codon adaptation across species in certain situations. However, I have certain theoretical concerns about CAIS as a direct proxy for the efficiency of selection sNe when mutation bias changes across species.
Strengths
(1) I appreciate that the authors recognize the potential issues of comparing CAI when amino acid usage varies and correct for this in CAIS. I think this is sometimes an under-appreciated point in the codon usage literature, as CAI is a relative measure of codon usage bias (i.e. only considers synonyms). However, the strength of natural selection on codon usage can potentially vary across amino acids, such that comparing mean CAI between protein regions with different amino acid biases may result in spurious signals of statistical significance.
(2) The CAIS metric presented here is generally applicable to any species that has an annotated genome with protein-coding sequences. A significant improvement over the previous version is the implementation of software tool for applying this method.
(3) The authors do a better job of putting their results in the context of the underlying theory of CAIS compared to the previous version.
(4) The paper is generally well-written.
Weaknesses
(1) The previously observed correlation between CAIS and body size was due to a bug when calculating phylogenetic independent contrasts. I commend the authors for acknowledging this mistake and updating the manuscript accordingly. I feel that the unobserved correlation between CAIS and body size should remain in the final version of the manuscript. Although it is disappointing that it is not statistically significant, the corrected results are consistent with previous findings (Kessler and Dean 2014).
(2) I appreciate the authors for providing a more detailed explanation of the theoretical basis model. However, I remain skeptical that shifts in CAIS across species indicates shifts in the strength of selection. I am leaving the math from my previous review here for completeness.
As in my previous review, let’s take a closer look at the ratio of observed codon frequencies vs. expected codon frequencies under mutation alone, which was previously notated as RSCUS in the original formulation. In this review, I will keep using the RSCUS notation, even though it has been dropped from the updated version. The key point is this is the ratio of observed and expected codon frequencies. If this ratio is 1 for all codons, then CAIS would be 0 based on equation 7 in the manuscript – consistent with the complete absence of selection on codon usage. From here on out, subscripts will only be used to denote the codon and it will be assumed that we are only considering the case of r = genome for some species s.
I think what the authors are attempting to do is “divide out” the effects of mutation bias (as given by Ei), such that only the effects of natural selection remain, i.e. deviations from the expected frequency based on mutation bias alone represents adaptive codon usage. Consider Gilchrist et al. GBE 2015, which says that the expected frequency of codon i at selection-mutation-drift equilibrium in gene g for an amino acid with Na synonymous codons is
where ∆M is the mutation bias, ∆η is the strength of selection scaled by the strength of drift, and φg is the gene expression level of gene g. In this case, ∆M and ∆η reflect the strength and direction of mutation bias and natural selection relative to a reference codon, for which ∆M,∆η = 0. Assuming the selection-mutation-drift equilibrium model is generally adequate to model of the true codon usage patterns in a genome (as I do and I think the authors do, too), the Ei,g could be considered the expected observed frequency codon i in gene g
E[Oi,g].
Let’s re-write the in the form of Gilchrist et al., such that it is a function of mutation bias ∆M. For simplicity we will consider just the two codon case and assume the amino acid sequence is fixed. Assuming GC% is at equilibrium, the term gr and 1 − gr can be written as
where µx→y is the mutation rate from nucleotides x to y. As described in Gilchrist et al. MBE 2015 and Shah and Gilchrist PNAS 2011, the mutation bias . This can be expressed in terms of the equilibrium GC content by recognizing that
As we are assuming the amino acid sequence is fixed, the probability of observing a synonymous codon i at an amino acid becomes just a Bernoulli process.
If we do this, then
Recall that in the Gilchrist et al. framework, the reference codon has ∆MNNG,NNG \= 0 =⇒ e−∆MNNG,NNG \=
(1) Thus, we have recovered the Gilchrist et al. model from the formulation of Ei under the assumption that natural selection has no impact on codon usage and codon NNG is the pre-defined reference codon. To see this, plug in 0 for ∆η in equation (1).
We can then calculate the expected RSCUS using equation (1) (using notation E[Oi]) and equation (6) for the two codon case. For simplicity assume, we are only considering a gene of average expression (defined as ). Assume in this case that NNG is the reference codon (∆MNNG,∆ηNNG \= 0).
This shows that the expected value of RSCUS for a two codon amino acid is expected to increase as the strength of selection ∆η increases, which is desired. Note that ∆η in Gilchrist et al. is formulated in terms of selection against a codon relative to the reference, such that a negative value represents that a codon is favored relative to the reference. If ∆η = 0 (i.e. selection does not favor either codon), then E[RSCUS] = 1. Also note that the expected RSCUS does not remain independent of the mutation bias. This means that even if sNe (i.e. the strength of natural selection) does not change between species, changes to the strength and direction of mutation bias across species could impact RSCUS. Assuming my math is right, I think one needs to be cautious when interpreting CAIS as representative of the differences in the efficiency of selection across species except under very particular circumstances.
Consider our 2-codon amino acid scenario. You can see how changing GC content without changing selection can alter the CAIS values calculated from these two codons. Particularly problematic appears to be cases of extreme mutation biases, where CAIS tends toward 0 even for higher absolute values of the selection parameter. Codon usage for the majority of the genome will be primarily determined by mutation biases,
with selection being generally strongest in a relatively few highly-expressed genes. Strong enough mutation biases ultimately can overwhelm selection, even in highly-expressed genes, reducing the fraction of sites subject to codon adaptation.
Peer review image 1.
Peer review image 2.
CAIS (Low Expression)
Peer review image 3.
CAIS (Average Expression)
Peer review image 4.
CAIS (High Expression)
If we treat the expected codon frequencies as genome-wide frequencies, then we are basically assuming this genome made up entirely of a single 2-codon amino acid with selection on codon usage being uniform across all genes. This is obviously not true, but I think it shows some of the potential limitations of the CAIS approach. Based on these simulations, CAIS seems best employed under specific scenarios. One such case could be when it is known that mutation bias varies little across the species of interest. Looking at the species used in this manuscript, most of them have a GC content around 0.41, so I suspect their results are okay (assuming things like GC-biased gene conversion are not an issue). Outliers in GC content probably are best excluded from the analysis.
Although I have not done so, I am sure this could be extended to the 4 and 6 codon amino acids. One potential challenge to CAIS is the non-monotonic changes in codon frequencies observed in some species (again, see Shah and Gilchrist 2011 and Gilchrist et al. 2015).
-
Author response:
The following is the authors’ response to the original reviews.
In addition to our responses to reviewer suggestions below, a minor bug in the calculation of CAIS was brought to our attention by a reader of our preprint. We have corrected this bug and rerun analyses, whose results became slightly stronger as noise was removed. While we were doing that, someone pointed out to us that our equations were almost the same as Kullback-Leibler divergence, which explains why our metric performed so well. We have made the numerically trivial (see before vs. after figure below) mathematical change to use Kullback-Leibler divergence instead, and now have a better story, with a solid basis in information theory, as to why CAIS works.
Author response image 1.
Unfortunately, we discovered a second bug that caused our PIC correction code to fail to perform the needed correction for phylogenetic confounding. The previously reported correlation between CAIS (or ENC) with body mass no longer survives PIC-correction. We have therefore removed this analysis from the manuscript. Our story now stands more on the theoretical basis of CAIS and ENC than on the post facto validation than it previously did. We now also present CAIS and ENC on a more equal footing. ENC results are slightly stronger, while CAIS has the complementary advantage of correcting for amino acid frequencies.
The work involved in these changes, as well as some of the responses to reviews below, justifies changing the second author into a co-first author, and adding an additional coauthor (Hanon McShea) who discovered the second bug.
Reviewer #1 (Public Review):
In this manuscript, the authors propose a new codon adaptation metric, Codon Adaptation Index of Species (CAIS), which they present as an easily obtainable proxy for effective population size. To permit between-species comparisons, they control for both amino acid frequencies and genomic GC content, which distinguishes their approach from existing ones. Having confirmed that CAIS negatively correlates with vertebrate body mass, as would be expected if small-bodied species with larger effective populations experience more efficient selection on codon usage, they then examine the relationship between CAIS and intrinsic structural disorder in proteins.
The idea of a robust species-level measure of codon adaptation is interesting. If CAIS is indeed a reliable proxy for the effectiveness of selection, it could be useful to analyze species without reliable life history- or mutation rate data (which will apply to many of the genomes becoming available in the near future).
A key question is whether CAIS, in fact, measures adaptation at the codon level. Unfortunately, CAIS is only validated indirectly by confirming a negative correlation with body mass. As a result, the observations about structural disorder are difficult to evaluate.
As discussed in the preamble above, we have replaced the body mass validation with a stronger theoretical basis in information theory.
A potential problem is that differences in GC between species are not independent of life history. Effective population size can drive compositional differences due to the effects of GC-biased gene conversion (gBGC). As noted by Galtier et al. (2018), genomic GC correlates negatively with body mass in mammals and birds. It would therefore be important to examine how gBGC might affect CAIS, and to what extent it could explain the relationship between CAIS and body mass.
Suppose that gBGC drives an increase in GC that is most pronounced at 3rd codon positions in highrecombination regions in small-bodied species. In this case, could observed codon usage depart more strongly from expectations calculated from overall genomic GC in small vertebrates compared to large ones? The authors also report that correcting for local intergenic GC was unsuccessful, based on the lack of a significant negative relationship with body mass (Figure 3D). In principle, this could also be consistent with local GC providing a relatively more appropriate baseline in regions with high recombination rates. Considering these scenarios would clarify what exactly CAIS is capturing.
Figure 3 (previously Supplementary Figures S5A and S5B) shows that CAIS is negligibly correlated with %GC (not robust to multiple comparisons correction), and ENC not at all. We believe this is evidence against the possibility brought up by the reviewer, i.e. that Ne might affect gBGC (and hence global %GC). This relationship, if present, could act as a confounding effect, but it is not present within our species dataset.
Note that we expect our genomic-GC-based codon usage expectations to reflect unchecked gBGC in an average genomic region, independently of whether that species has high or low Ne. Our working model is that non-selective forces, include gBGC as well as conventional mutation biases, vary among species, and that they rather than selection determine each species’ genome-wide %GC. By correcting for genome-wide %GC, CAIS and ENC correct for both mutation bias and gBGC, in order to isolate the effects of selection.
This argument, based on an average genomic region, is vulnerable to gene-rich genomic regions having differentially higher recombination rates and hence GC-biased gene conversion. However, we do not see the expected positive correlation between |𝐥𝐨𝐜𝐚𝐥 𝐆𝐂 - global GC| and CAIS (see new Figure 5), again suggesting that gene conversion strength is not a confounding factor acting on CAIS.
Given claims about "exquisitely adapted species", the case for using CAIS as a measure of codon adaptation would also be stronger if a relationship with gene expression could be demonstrated. RSCU is expected to be higher in highly expressed genes. Is there any evidence that the equivalent GCcontrolled measure behaves similarly?
Correlations with gene expression are outside the scope of the current work, which is focused on producing and exploiting a single value of codon adaptation per species. It is indeed possible that our general approach of using Kullback-Leibler divergence to correct for genomic %GC could be useful in future work investigating differences among genes.
The manuscript is overall easy to follow, though some additional context may be helpful for the general reader. A more detailed discussion of how this work compares to the approach taken by Galtier et al. (2018), which accounted for GC content and gBGC when examining codon preferences, would be appropriate, for example. In addition, it would have been useful to mention past work that has attempted to explicitly quantify selection on codon usage.
One key difference between our work and that of Galtier et al. 2018 is that our approach does not rely on identifying specific codon preferences as a function of species. Our approach might therefore be robust to scenarios where different genes have different codon preferences (see Gingold et al. 2014 https://doi.org/10.1016/j.cell.2014.08.011). At a high level, our results are in broad agreement with those of Galtier et al., 2018, who found that gBGC affected all animal species, regardless of Ne, and who like us, found that the degree of selection on codon usage depended on Ne.
Reviewer #2 (Public Review):
## Summary
The goal of the authors in this study is to develop a more reliable approach for quantifying codon usage such that it is more comparable across species. Specifically, the authors wish to estimate the degree of adaptive codon usage, which is potentially a general proxy for the strength of selection at the molecular level. To this end, the authors created the Codon Adaptation Index for Species (CAIS) that controls for differences in amino acid usage and GC% across species. Using their new metric, the authors find a previously unobserved negative correlation between the overall adaptiveness of codon usage and body size across 118 vertebrates. As body size is negatively correlated with effective population size and thus the general strength of natural selection, the negative correlation between CAIS and body size is expected. The authors argue this was previously unobserved due to failures of other popular metrics such as Codon Adaptation Index (CAI) and the Effective Number of Codons (ENC) to adequately control for differences in amino acid usage and GC content across species. Most surprisingly, the authors also find a positive relationship between CAIS and the overall "disorderedness" of a species protein domains. As some of these results are unexpected, which is acknowledged by the authors, I think it would be particularly beneficial to work with some simulated datasets. I think CAIS has the potential to be a valuable tool for those interested in comparing codon adaptation across species in certain situations. However, I have certain theoretical concerns about CAIS as a direct proxy for the efficiency of selection $sN_e$ when the mutation bias changes across species.
## Strengths
(1) I appreciate that the authors recognize the potential issues of comparing CAI when amino acid usage varies and correct for this in CAIS. I think this is sometimes an under-appreciated point in the codon usage literature, as CAI is a relative measure of codon usage bias (i.e. only considers synonyms). However, the strength of natural selection on codon usage can potentially vary across amino acids, such that comparing mean CAI between protein regions with different amino acid biases may result in spurious signals of statistical significance (see Cope et al. Biochemica et Biophysica Acta - Biomembranes 2018 for a clear example of this).
We now cite Cope et al. as an example of how amino acid composition can act as a confounding factor.
(2) The authors present numerous analysis using both ENC and mean CAI as a comparison to CAIS, helping given a sense of how CAIS corrects for some of the issues with these other metrics. I also enjoyed that they examined the previously unobserved relationship between codon usage bias and body size, which has bugged me ever since I saw Kessler and Dean 2014. The result comparing protein disorder to CAIS was particularly interesting and unexpected.
Unfortunately, our previous PIC correction code was buggy, and in fact the relationship with body size does not survive PIC correction (although it is strong prior to PIC correction). We have therefore removed it from the paper. However, the more novel result on protein disorder remains strong.
(3) The CAIS metric presented here is generally applicable to any species that has an annotated genome with protein-coding sequences.
## Weaknesses
(1) The main weakness of this work is that it lacks simulated data to confirm that it works as expected. This would be particularly useful for assessing the relationship between CAIS and the overall effect of protein structure disorder, which the authors acknowledge is an unexpected result. I think simulations could also allow the authors to assess how their metric performs in situations where mutation bias and natural selection act in the same direction vs. opposite directions. Additionally, although I appreciate their comparisons to ENC and mean CAI, the lack of comparison to other popular codon metrics for calculating the overall adaptiveness of a genome (e.g. dos Reis et al.'s $S$ statistic, which is a function of tRNA Adaptation Index (tAI) and ENC) may be more appropriate. Even if results are similar to $S$, CAIS has a noted advantage that it doesn't require identifying tRNA gene copy numbers or abundances, which I think are generally less readily available than genomic GC% and protein-coding sequences.
The main limitation of dos Reis’s test in our view is that, like the better versions of CAI, it requires comparable orthologs across species. See also the discussion below re the benefits of proteome-wide approach. We now also note the advantage of not needing tRNA gene copy numbers and abundances.
Simulated datasets would be great, but we think it a nice addition rather than must-have, in particular because we are skeptical about whether our understanding of all relevant processes is good enough such that simulations would add much to our more heuristic argument along the lines of Figure 2. E.g. the complications of Gingold et al. 2014 cited above are pertinent, but incorporating them would make simulations quite involved. Instead, we now have a stronger theoretical justification for CAIS grounded in information theory. We have significantly expanded discussion of Figure 2 to give a clearer idea of the conceptual underpinnings of CAIS and ENC.
The authors mention the selection-mutation-drift equilibrium model, which underlies the basic ideas of this work (e.g. higher $N_e$ results in stronger selection on codon usage), but a more in-depth framing of CAIS in terms of this model is not given. I think this could be valuable, particularly in addressing the question "are we really estimating what we think we're estimating?"
Let's take a closer look at the formulation for RSCUS. From here on out, subscripts will only be used to denote the codon and it will be assumed that we are only considering the case of r = genome for some species s.
I think what the authors are attempting to do is "divide out" the effects of mutation bias (as given by $E_i$), such that only the effects of natural selection remain, i.e. deviations from the expected frequency based on mutation bias alone represent adaptive codon usage. Consider Gilchrist et al. MBE 2015, which says that the expected frequency of codon i at selection-mutation-drift equilibrium in gene g for an amino acid with Na synonymous codons is
where ∆M is the mutation bias, ∆η is the strength of selection scaled by the strength of drift, and φg is the gene expression level of gene g. In this case, ∆M and ∆η reflect the strength and direction of mutation bias and natural selection relative to a reference codon, for which ∆M,∆η = 0. Assuming the selection-mutation-drift equilibrium model is generally adequate to model of the true codon usage patterns in a genome (as I do and I think the authors do, too), the Ei,g could be considered the expected observed frequency codon i in gene g
E[Oi,g].
Let’s re-write the in the form of Gilchrist et al., such that it is a function of mutation bias ∆M. For simplicity we will consider just the two codon case and assume the amino acid sequence is fixed. Assuming GC% is at equilibrium, the term gr and 1 − gr can be written as
where µx→y is the mutation rate from nucleotides x to y. As described in Gilchrist et al. MBE 2015 and Shah and Gilchrist PNAS 2011, the mutation bias .This can be expressed in terms of the equilibrium GC content by recognizing that
As we are assuming the amino acid sequence is fixed, the probability of observing a synonymous codon i at an amino acid becomes just a Bernoulli process.
If we do this, then
Recall that in the Gilchrist et al. framework, the reference codon has ∆MNNG,NNG \= 0 =⇒ e−∆MNNG,NNG \=1. Thus, we have recovered the Gilchrist et al. model from the formulation of $E_i$ under the assumption that natural selection has no impact on codon usage and codon NNG is the pre-defined reference codon. To see this, plug in 0 for ∆η in equation (1)..
We can then calculate the expected RSCUS using equation (1) (using notation E[Oi]) and equation (6) for the two codon case. For simplicity assume, we are only considering a gene of average expression (defined as ). Assume in this case that NNG is the reference codon (∆MNNG,∆ηNNG \= 0).
This shows that the expected value of RSCUS for a two-codon amino acid is expected to increase as the strength of selection $\Delta\eta$ increases, which is desired. Note that $\Delta\eta$ in Gilchrist et al. is formulated in terms of selection *against* a codon relative to the reference, such that a negative value represents that a codon is favored relative to the reference. If $\Delta\eta = 0$ (i.e. selection does not favor either codon), then $E[RSCUS] = 1$. Also note that the expected RSCUS does not remain independent of the mutation bias. This means that even if $sN_e$ (i.e. the strength of natural selection) does not change between species, changes to the strength and direction of mutation bias across species could impact RSCUS. Assuming my math is right, I think one needs to be cautious when interpreting CAIS as representative of the differences in the efficiency of selection across species except under very particular circumstances. One such case could be when it is known that mutation bias varies little across the species of interest. Looking at the species used in this manuscript, most of them have a GC content ranging around 0.41, so I suspect their results are okay.
Although I have not done so, I am sure this could be extended to the 4 and 6 codon amino acids.
We thank Reviewer 2 for explicitly laying out the math that was implicit in our Figures 1 and 2. While we keep our more heuristic presentation, our revised manuscript now more clearly acknowledges that the per-site codon adaptation bias depicted in Figure 1 has limited sensitivity to s*Ne. The reason that we believe our approach worked despite this, is that we think the phenomenon is driven by what is shown in Figure 2. I.e., where Ne makes a difference is by determining the proteome-wide fraction of codons subject to significant codon adaptation, rather than by determining the strength of codon adaptation at any particular site or gene. We have made multiple changes to the texts to make this point clearer.
Another minor weakness of this work is that although the method is generally applicable to any species with an annotated genome and the code is publicly available, the code itself contains hard-coded values for GC% and amino acid frequencies across the 118 vertebrates. The lack of a more flexible tool may make it difficult for less computationally-experienced researchers to take advantage of this method.
Genome-wide %GC values are hard-coded because they were taken from the previous study of James et al. (2023) https://doi.org/10.1093/molbev/msad073. As summarized in the manuscript, genome-wide %GC was a byproduct of a scan of all six reading frames across genic and intergenic sequences available from NCBI with access dates between May and July 2019. The more complicated code used to calculate the intergenic %GC, and the code used to calculate amino acid frequencies is located at https://github.com/MaselLab/CodonAdaptation-Index-of-Species. Luckily, someone else just wrote a simpler end to end pipeline for us, on the basis of our preprint. We now note this in the Acknowledgements, and link to it: https://github.com/gavinmdouglas/handy_pop_gen/blob/main/CAIS.py.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This is a valuable study in which the authors provide an expression profile of the human blood fluke, Schistosoma mansoni. A strength of this solid study is in its inclusion of in situ hybridisation to validate the predictions of the transcript analysis.
-
Reviewer #1 (Public Review):
In this work, the authors provide a valuable transcriptomic resource for the intermediate free-living transmission stage (miracidium larva) of the blood fluke. The single-cell transcriptome inventory is beautifully supplemented with in situ hybridization, providing spatial information and absolute cell numbers for many of the recovered transcriptomic states. The identification of sex-specific transcriptomic states within the populations of stem cells was particularly unexpected. The work comprises a rich resource to complement the biology of this complex system.
Comments on revised version:
I have read through the responses and the revised manuscript. I think together this results in an improved version.
-
Reviewer #2 (Public Review):
Summary:
In this manuscript the authors have generated a single-cell atlas of the miracidium, the first free-living stage of an important human parasite, Schistosoma mansoni. Miracidia develop from eggs produced in the mammalian (human) host and are released into freshwater, where they can infect the parasite's intermediate snail host to continue the life cycle. This study adds to the growing single-cell resources that have already been generated for other life-cycle stages and, thus, provides a useful resource for the field.
Strengths:
Beyond generating lists of genes that are differentially expressed in different cell types, the authors validated many of the cluster-defining genes using in situ hybridization chain reaction. In addition to providing the field with markers for many of the cell types in the parasite at this stage, the authors use these markers to count the total number of various cell types in the organism. Because the authors realized that their cell isolation protocols were biasing the cell types they were sequencing, they applied a second method to help them recover additional cell types.
Schistosomes have ZW sex chromosomes and the authors make the interesting observation that the stem cells at this stage are already expressing sex (i.e. W)-specific genes.
Comments on revised version:
The manuscript has been improved after revisions. The methods, data and analyses broadly support the claims with only minor weaknesses.
-
Author response:
The following is the authors’ response to the original reviews.
eLife assessment
This is a valuable study in which the authors provide an expression profile of the human blood fluke, Schistosoma mansoni. A strength of this solid study is in its inclusion of in situ hybridisation to validate the predictions of the transcript analysis.
We thank the reviewers and the editor for their effort and expertise in reviewing our manuscript. We have made changes based on the reviews and believe this has greatly strengthened our manuscript. We appreciate their insightful comments and suggestions.
Public Reviews:
Reviewer #1 (Public Review):
In this work, the authors provide a valuable transcriptomic resource for the intermediate free-living transmission stage (miracidium larva) of the blood fluke. The single-cell transcriptome inventory is beautifully supplemented with in situ hybridization, providing spatial information and absolute cell numbers for many of the recovered transcriptomic states. The identification of sex-specific transcriptomic states within the populations of stem cells was particularly unexpected. The work comprises a rich resource to complement the biology of this complex system, however falls short in some technical aspects of the bioinformatic analyses of the generated sequence data.
(1) Four sequencing libraries were generated and then merged for analysis, however, the authors fail to document any parameters that would indicate that the clustering does not suffer from any batch effects.
We thank the reviewer for this comment which has given us the opportunity to elaborate on this interesting point. Consequently, we have added evidence to show that the data do not suffer from batch effects between samples (e.g. between sorted samples 1 and 4, and unsorted samples 2 and 3). We now show that there are contributions to all clusters from sorted and unsorted samples and highlight the benefits to using both conditions in a cell atlas with unknown cell types.
Accordingly, we have now added the following paragraph to line 153:
There were contributions from sorted and unsorted samples in almost all clusters (except ciliary plates). We found that some cell/tissue types had similar recovery from both methods (e.g. Stem A, Muscle 2, and Tegument), others were preferentially recovered by sorting (e.g Neuron 1, Neuron 4, and Stem E), and some were depleted by sorting (e.g. Parenchyma 1, Protonephridia, and Ciliary plates) (Supplementary Figure 1) , Supplementary Table 4). This variation in recovery, therefore, enabled us to maximise the discovery and inclusion of different cell types in the atlas.
We have now added a Supplementary Figure 1 showing the contribution of sorted and unsorted cells to the Seurat clusters. We have also included a Supplementary Table 4 detailing the cell number contribution for both conditions and the percentages in order to easily compare differential recovery between cell types.
These are added to the manuscript.
(2) Additionally, the authors switch between analysis platforms without a clear motivation or explanation of what the fundamental differences between these platforms are. While in theory, any biologically robust observation should be recoverable from any permutation of analysis parameters, it has been recently documented that the two popular analysis platforms (Seurat - R and scanPy python) indeed do things slightly differently and can give different results (https://www.biorxiv.org/content/10.1101/2024.04.04.588111v1). For this reason, I don't think that one can claim that Seurat fails to find clusters resolved by SAM without running a similar pipeline on the cluster alone as was done with SAM/scanPy here. The manuscript itself needs to be checked carefully for misleading statements in this regard.
We thank the reviewer for this comment and agree that it’s important to increase the clarity on this matter. We have added additional detail to explain that results of subclustering Neuron 1 using Seurat and SAM/ScanPy were broadly similar, but that we presented the results from the SAM/ScanPy analysis due to the strengths of SAM in detecting small differences in gene expression (Tarashanky et al., 2019 PMID: 31524596). We have included here the UMAP showing subclustering of Neuron 1 in Seurat for comparison.
Author response image 1.
UMAP showing subclustering of Neuron 1 cluster in Seurat (SCT normalisation, PC = 19, resolution = 0.3).
We’ve added this additional text to the ‘Neuron abundance and diversity’ section on line 220:
We explored whether Neuron 1 could be further subdivided into transcriptionally distinct cells by subclustering (Supplementary Figure 2; Supplementary Table 6) using the self-assembling manifold (SAM) algorithm (Tarashansky et al., 2019) with ScanPy (Wolf et al., 2018), given its reported strength in discerning subtle variation in gene expression (Tarashansky et al., 2019), although a similar topology was subsequently found using Seurat.
(3) Similarly, the manuscript contains many statements regarding clusters being 'connected to', or forming a 'bridge' on the UMAP projection. One must be very careful about these types of statements, as the relative position of cells on a reduced-dimension cell map can be misleading (see Chari and Pachter 2023). To support these types of interpretations, the authors should provide evidence of gene expression transitions that support connectivity as well as stability estimates of such connections under different parameter conditions. Otherwise, these descriptors hold little value and should be dropped and the transcriptomic states simply defined as clusters with no reference to their positions on the UMAP.
We thank the reviewer for this thoughtful comment. We agree and have rephrased those statements accordingly e.g. line numbers 218, 439, 543, and 557.
(4) The underlying support for the clusters as transcriptomically unique identities is not well supported by the dot plots provided. The authors used very permissive parameters to generate marker lists, which hampers the identification of highly specific marker genes. This permissive approach can allow for extensive lists of upregulated genes for input into STRING/GO analyses, this is less useful for evaluating the robustness of the cluster states. Running the Seurat::FindAllMarkers with more stringent parameters would give a more selective set of genes to display and thereby increase the confidence in the reader as to the validity of profiles selected as being transcriptomically unique.
The Reviewer is correct in noting that we used a permissive approach to enable a better understanding of the biology of each cluster, based on analysing enriched functions. However, we disagree about the suitability of the approach for finding markers. First, the permissive approach produced longer candidate lists, but those with the best AUC scores for each cluster are at the top of the list for each cluster. Second, some of the markers with lower expression also revealed interesting biology (e.g. Notum in the muscles). Furthermore, we used filtering on the marker genes lists to increase the minimum marker gene scores for analyses such as the GO analyses (details in the GO section of the methods). It’s important to stress that our approach also utilised validation by FISH for top marker genes, as well as biologically informative genes that were lower down the marker gene list.
(5) Figure 5B shows a UMAP representation of cell positions with a statement that the clustering disappears. As a visual representation of this phenomenon, the UMAP is a very good tool, however, to make this statement you need to re-cluster your data after the removal of this gene set and demonstrate that the data no longer clusters into A/B and C/D.
We’ve added Supplementary Figure 13 to show that after removing WSR and ZSR genes and reclustering, the data no longer clusters in A/B and C/D, even at a higher resolution where clusters appear oversplit.
Also, as a reader, these data beg the question: which genes are removed here? Is there an over-representation of any specific 'types' of genes that could lead to any hypotheses of the function? Perhaps the STRING/GO analyses of this gene set could be informative.
We have performed GO-enrichment analyses on W-specific genes, Z-specific genes and both together compared to the rest of the genome, but we did not find very informative results (see Supplementary Table 13 that we have now added, line 464). This may be due to the large difference in size. There are approx 900 Z-specific genes (males two copy, females one copy), while approx 30 W-specific genes many of which have homologs in the Z-specific region of the genome. Instead we suggest that tissue-specific regulation of gene dosage compensation is the more likely explanation as reported for other species (Valsecchi et al. 2018).
(6) How do the proportions of cell types characterized via in situ here compare to the relative proportions of clusters obtained? It does not correspond to the percentages of the clusters captured (although this should be quantified in a similar manner in order to make this comparison direct: 10,686/20,478 = ~50% vs. 7%), how do you interpret this discrepancy? While this is mentioned in the discussion, there is no sufficient postulation as to why you have an overabundance of the stem cells compared to their presence in the tissue. While it is true that you could have a negative selection of some cell types, for example as stated the size of the penetration glands exceeds both that of the 10x capabilities (40uM), and the 30uM filters used in the protocol, this does not really address why over half of the captured cells represent 'stem cells'. A more realistic interpretation would be biological rather than merely technical. For example, while the composition of the muscle cells and the number of muscle transcriptomes captured are quite congruent at ~20%, the organism is composed of more than 50% of neurons, but only 15% of the transcriptomic states are assigned to neuronal. Could it be that a large fraction of the stem cells are actually neural progenitors? Are there other large inconsistencies between the cluster sizes and the fraction of expected cells? Could you look specifically at early transcription factors that are found in the neurons (or other cell types) within the various stem cell populations to help further refine the precursor/cell type relationships?
Yes, it is really interesting that more than 50% of cells in the animal are neurons whereas more than 50% of cells in scRNAseq data are stem cells. This dataset provides a unique opportunity to compare tissue composition in the whole animal to the corresponding single cell RNAseq dataset.
The table (in Supplementary Table 17) shows the percentage of cells from each tissue type in the miracidium (identified via in situ hybridisation of tissue-type marker genes) and in the scRNAseq to understand this phenomenon.
This table shows that the single cell protocol used in this study negatively selected for nerves and tegument, and positively selected for stem and parenchyma. The composition of the muscle and protonephridia cells and the number of muscle and protonephridia transcriptomes captured are quite congruent.
This technical finding is also biologically consistent. For instance, the tegument cells span the body wall muscles, with the cell bodies below and a syncytial layer above. It is not known how the tegument fragments during the dissociation process, and which parts of the cells get packaged by the 10X GEMs. Because of tegumental structure, the cells are likely prone to damage, and therefore we speculate that is why the tegument cells are under-represented in our 10X data. Unusually shaped fragments may not have been captured in 10X GEMs and of those that were, damaged or distressed tegument cells/fragments may have been excluded post-sequencing, by QC filters including cell calling, mitochondrial percentage and low transcript count (e.g. if there there was a tegumental fragment with 100 transcripts it would have not passed QC). Stem cells are spherical with a large nucleus:cytoplasm ratio, likely making them more robust during dissociation and more likely to be captured in 10X GEMs.
We don’t think that a large fraction of the stem cells are actually neural progenitors because:
(1) we used previously reported marker genes of different tissue types to identify the single cell RNAseq clusters, e.g. Ago2-1 for stem cells, which has been used in multiple life stages.
(2) The stem cell transcriptomes express many previously reported stem cell marker genes.
(3) We found that the stem cells from the single cell data generally had higher numbers of transcripts than the other cell types which is consistent with the Wang et al. 2013 observation that RNA marker POPO-1 could distinguish germinal (stem) cells from other cell types as they are RNA rich.
(4) We also found higher numbers of ribosomal related transcripts in our stem cell transcriptomes, which is consistent with Pan’s observation that part of the distinct morphology of stem cells is densely packed ribosomes in the cytoplasm.
In order to elaborate on this discussion we have generated new visualisations:
(1) A UMAP of the stem cell marker ago2-1 (Supplementary figure 10), to further illustrate our evidence in classifying the stem cell clusters
(2) A co-expression plot of the stem cell marker ago2-1 with neural marker complexin to confirm that there is little coexpression (the most coexpression being in Neuron 1 and Stem F). We identified that 15.56% of cells in the Stem F cluster show some expression of complexin (neural marker), suggesting that a small fraction of Stem F may be early/precursor neurons, but the gene expression indicates that the majority of cells in Stem F are more likely to be stem cells than any other tissue type. There is little to no complexin expression in the other stem clusters.
(3) Expression plots of the 5 neurogenins (TFs involved in neuronal differentiation) we could identify using WormBase ParaSite in these data. Four of the five showed very little expression, and not in specific clusters. The fifth (Smp_072470) showed slightly more expression, though still sparse, mostly across the stem and neural clusters not enough to indicate that any of the stem clusters are neural progenitors.
Author response image 2.
Coexpression UMAP showing the expression of stem cell marker Ago2-1 and neural marker complexin.
Author response image 3.
UMAPs showing the expression five putative neurogenins of S.mansoni.
Reviewer #2 (Public Review):
Summary:
In this manuscript the authors have generated a single-cell atlas of the miracidium, the first free-living stage of an important human parasite, Schistosoma mansoni. Miracidia develop from eggs produced in the mammalian (human) host and are released into freshwater, where they can infect the parasite's intermediate snail host to continue the life cycle. This study adds to the growing single-cell resources that have already been generated for other life-cycle stages and, thus, provides a useful resource for the field.
Strengths:
Beyond generating lists of genes that are differentially expressed in different cell types, the authors validated many of the cluster-defining genes using in situ hybridization chain reaction. In addition to providing the field with markers for many of the cell types in the parasite at this stage, the authors use these markers to count the total number of various cell types in the organism. Because the authors realized that their cell isolation protocols were biasing the cell types they were sequencing, they applied a second method to help them recover additional cell types.
Schistosomes have ZW sex chromosomes and the authors make the interesting observation that the stem cells at this stage are already expressing sex (i.e. W)-specific genes.
Weaknesses:
The sample sizes upon which the in situ hybridization results and cell counts are based are either not stated (in most cases) or are very small (n=3). This lack of clarity about biological replicates and sample sizes makes it difficult for the reader to assess the robustness of the results and the extremely small sample sizes (when provided) are a missed opportunity to explore the variability of the system, or lack thereof.
We have now added more details about the methods we used for validating cell type marker genes by in situ hybridisation. We have added to the methods that ‘We carried out at least three in situ hybridisation experiments for each marker gene we validated (each experiment was a biological replicate). From each experiment we imaged (by confocal microscopy) at least 10 miracidia (technical replicates) per marker gene experiment.’ on line 1036.
In the figure legends we have added the number of miracidia that were screened, and documented the percentage of the screened larvae that showed the in situ gene expression pattern that is seen in the images in the figures, and that we described in the text.
We manually segmented the nuclei of pan tissue marker genes, and we did this for one miracidium in the case of all tissues, except stem cells where we segmented stem cells in five larvae. Manual segmentation of gene expression in a confocal z-stack is very time consuming. We consider that the variability of different cell and tissue types (stereotypy) between miracidia is beyond the scope of this paper and can be investigated in future work.
Although assigning transcripts to a given cell type is usually straightforward via in situ experiments, the authors fail to consider the potential difficulty of assigning the appropriate nuclei to cells with long cytoplasmic extensions, like neurons. In the absence of multiple markers and a better understanding of the nervous system, it seems likely that the authors have overestimated the number of neurons and misassigned other cell types based on their proximity to neural projections.
This is a valid point, and we acknowledge the difficulties of assigning a nucleus to a cell using mRNA expression only and in the absence of a cell membrane marker. We tried to address this issue by labelling the cell membranes using an antibody against beta catenin after the HCR in situ protocol. This method has been used successfully on sections on slides (Schulte et al., 2024), but we failed to get usable results in our miracidia whole-mounts. The beta catenin localisation marked the membranes of the gland cells but didn’t do the same for the neurons or other cell types (see image below).
Author response image 4.
Image showing a maximum intensity projection of a subvolume of a confocal z-stack of a miracidia wholemount in situ hybridisation (by HCR) for paramyosin counterstained with a beta catenin antibody (1:600 concentration of Sigma C2206). The cell membrane of a lateral gland is clearly labelled, but those of the neurons of the brain and the paramyosin+ muscle cells are not.
Our observation that 57% of the cells in a miracidium are nerves is high compared to the C.elegans hermaphrodite adult in which 302 out of 959 cells are neurons (Hobert et al., 2016), few studies have equivalent data with which to make comparisons. Despite this, and the limitation described above, we believe that we have not overestimated the number of neural cells. During the process of validating the marker genes and closely examining gene expression in hundreds of miracidia, we noted that the nuclei of different tissue types are distinct and recognisable (see figure below). The nuclei of stem, tegument and parenchymal cells are comparatively large and spherical with obvious nucleoli (i). The four nuclei of the apical gland cell are angular, pentagonal in shape and sit adjoining each other (inside red dashed circle, i-iii), those of the two lateral glands are bilaterally symmetrical and surrounded by flask shaped cytoplasm (arrows, iv). The nuclei of the body wall muscle cells are peripheral and flattened on the outer edge (iii). The notum+ muscle cell nuclei are anterior of the apical gland (manuscript Figure 2E). The only other two tissue types are the nerves and protonephridia, and their nuclei are smaller and more compact/condensed. In situ expression of the protonephridia marker suggests that 6 cells make up the protonephridial system (manuscript Figure 4 B&E). Therefore, by process of elimination, the remaining nuclei should belong to neurons. The complexin expression pattern supports this and we counted 209 nuclei that were surrounded by cpx transcript expression. To help the reader interpret this for themselves we have added confocal z-stacks of miracidia where tissue level markers have been multiplexed (supplementary videos 18-20). We counted all tissue type cells individually and the tissue type cell numbers added up to the overall cell count.
Author response image 5.
Image showing the diversity of nucleus morphology between tissue types in the miracidium.
Biologically, it is not surprising that this larva is dominated by neural cells. It must navigate a complex aquatic environment and identify a suitable mollusc host in less than 12 hours. It is a non-feeding vehicle that must deliver the stem cells to a suitable environment where they can develop into the subsequent life cycle stage. Accordingly, the cell type composition reflects this challenge.
The conclusion that germline genes are expressed in the miracidia stem cells seems greatly overstated in the absence of any follow-up validation. The expression scales for genes like eled and boule are more than 3 orders of magnitude smaller than those used for any of the robustly expressed genes presented throughout the paper. These scales are undefined, so it isn't entirely clear what they represent, but neither of these genes is detected at levels remotely high (or statistically significant) enough to survive filters for cluster-defining genes.
Given that germ cells often develop early in embryogenesis and arrest the cell cycle until later in development, and that these transcripts reveal no unspliced forms, it seems plausible that the authors are detecting some maternally supplied transcripts that have yet to be completely degraded.
We agree that the expression of genes such as eled and boule are low. We made this clear in the figure legends and text, and have now added scale information to the figure legends. We did not explore these genes as cluster-defining genes, partly due to their comparatively low levels of expression, but as genes already reported to be important in germ line specification. We found the expression of these genes to be consistent with our hypothesis that the Kappa stem cells may include germ line segregated cells, but our hypothesis does not rest on these lower-expressed genes.
It is certainly possible that we have detected some maternally supplied transcripts in the miracidia stem cells. However experiments to distinguish between zygotic and maternal transcripts using metabolic labelling of zygotic transcripts (e.g. Fishman et al. 2023) would be hard in this species due to the hard egg capsule and its ectolethical embryogenesis. Therefore this is out of scope for this work, but this would be a very interesting topic to follow up on and develop tools for.
We have added these sentences to the Discussion ln 746 ‘Intriguingly, the presence of spliced-only copies of the germline defining genes eled and boule could suggest that they are maternal transcripts that have been restricted to the primordial germ cells during embryogenesis, as is the case in Zebrafish embryos (Fishman et al., 2023). An alternative explanation is that unspliced transcripts exist for these lowly expressed genes but their abundance was below our threshold for detection.’
Reviewer #1 (Recommendations For The Authors):
Ln 138: specify the version of Seurat used, and reference the primary papers for this software. Also, from the dot plot shown here, these do not all appear to be supported by unique gene sets. How was the final clustering determined? This information is in the methods section, but a summary here could make it more robust for the readership.
In addition to the details in the methods section, we have added the version and referenced the version-specific primary paper for Seurat when it is first mentioned. We have also summarised the methods used to select the final clustering when we first present the results to aid in clarity.
We added to line 140 ‘Using Seurat (version 4.3.0) (Hao et al., 2021), 19 distinct clusters of cells were identified, along with putative marker genes best able to discriminate between the populations (Figure 1C & D and Supplementary Table 2 and 3). We used Seurat’s JackStraw and ElbowPlot, along with molecular cross-validation to select the number of principal components, and Seurat’s clustree to select a resolution where clusters were stable (Hao et al., 2021).’
Ln 147: isn't seven stem cell clusters a lot? See comment in public review.
We did not have preconceived expectations of the number of stem cell clusters, and were guided by the data and gene expression. In doing so we also discovered that four of those clusters were likely only two ‘biologically or functionally distinct’ clusters, but these split into four clusters based on the expression of genes on the sex-specific regions of the chromosomes, which was both unexpected and interesting.
Figure 1D: gene model names are un-informative for the general reader. Can you provide any putative gene identities here to render this plot interpretable? For example in the main text you state that Smp-085540 is paramyosin; please use this annotation in all your visual material (as is used in Figure 2A).
We have added gene names to the dotplots in all figures with the locus identifier (minus the ‘Smp’ prefix) in brackets after the gene name.
Ln 191:196 Identification of the two muscle clusters as circular and longitudinal muscles is very well supported. However, it would be interesting to look specifically at the genes that are different here. Did the authors attempt to specifically pull out genes differentially expressed between these two groups, or only examine the output of FindAllMarkers at this point?
We did indeed look specifically for genes differentially expressed between the muscle clusters, the results of which can be found in Supplementary Table 5 (Line 206). This analysis revealed “Wnt-11-1 (circular) and MyoD (longitudinal) were among the most differentially expressed genes”, which were important findings in our understanding of the muscle cells in the miracidium.
Ln 207: "connected to stem F" - does this refer specifically to their relative positions on the UMAP in Figure 1C? One must be very careful about these types of statements, as the relative position of cells on a reduced-dimension cell map can be misleading (public review).
We agree, and have rephrased accordingly.
Ln 209:211: Here the authors switch from Seurat (R) as an analysis package, to SAM (python) for subset analysis of one large neural cluster. The results indicate that there may be small populations of transcriptomically distinct neural subtypes also within the neural1 cluster, but that the vast majority of these cells do not express unique transcriptomic profiles. Also in the supplementary material for this (SF1) there is a question of whether or not there is any clustering according to batch effects.
In general, I find the neuronal section a little difficult to follow and it is unclear how many unique profiles are present and which are documented with in situ. I would recommend re-running the analysis on the entire neural subset (n1:5: complexin positive) and generating an inventory of putatively unique neural states with the associated in situ validation altogether in a main figure.
In response to comments above we have both clarified our reasoning for using SAM analysis, and presented more details on possible batch effects. We have gone through the neural system results in order to make it clearer for the reader to follow.
Ln 236: here the authors introduce a STRING analysis for the first time. Also, this method requires some introduction for the general audience in terms of its goals and general functionality and output.
We used STRING analysis on some well defined clusters to provide additional clues about function. At the first mention of STRING (neuron 3 results) we have added the following statement to give more introduction to the reader: “STRING analysis of the top 100 markers of Neuron 3 predicted two protein interaction networks with functional enrichment: ….”
Ln. 280:281. It is unclear why Steger et al is referenced here. In what way does a description of neural and glandular cell transcriptomic similarity in a Cnidarian inform your data on a member of the playhelmenthes? (which should also be referenced in the introduction: to which phylogenetic lineage does Schistosoma belong).
We have now added that the Schistosoma belong to the Platyhelminths on the first line of the introduction.
Ln 295 we have added ‘We expected to find a discrete cluster(s) for the penetration glands, and that it would show similarities to the neural clusters (as glandular cells arise from neuroglandular precursor cells in other animals, such as the sea anemone, Nematostella vectensis, Steger et al., 2022).’
Ln 339: explain the motivation for generating a further plate-based scRNA of the ciliary plates.
We wished to include the ciliary plates alongside the gland cells for plate based RNAseq as they are unique to the miracidium stage and wanted to make sure we had captured them in this study.
Ln 345: Define the tegumental cells for the general reader.
We have added further description on tegument cells in the introduction and tegument results section, e.g. on line 61, 366).
Ln 365: "this cluster" is imprecise. Which cluster are we looking at here?' Also: were flame cells already described morphologically at this stage, or is this the first description of the protonephridial system for this stage of the life cycle?
We have now clarified which cluster we are talking about in the text. The flame cells have been described using TEM before (Pan, 1980).
Stem Cells: also here you refer to cells as 'bridge' which refers to the configuration of the UMAP. While this is likely a biological representation of a different differentiation state, the nomination of this based solely on the UMAP representation should be avoided.
We have rephrased this.
Figure 5B: What is neuron 6? This was Neuron 3 in Figure 1.
Thank you for spotting these mistakes in the labelling, we have corrected them now.
Ln 421:438 - Here you represent a UMAP representation of the cell positions, but state that the clustering disappears. See comment in Public Review.
Modified accordingly, see response in public review.
Ln 472 "Cells in stem E, F, and G in silico clusters might be stressed/damaged/dying cells or cells in transcriptionally transitional states." Is there any evidence supporting either of these conclusions?
We found that 15.56% of the cells in Stem F expressed the neural marker complexin, leading us to consider the possibility that a fraction of these cells may be neural precursors. Stem F also had some cells with a mitochondrial % near the maximum threshold we set, suggesting they could be experiencing some stress. Since we could not identify clear markers for these clusters, their function and a more specific identity, beyond ‘stem’, is not yet known.
That the two stem cell populations contribute to different parts of the next life cycle stage is interesting. The combined analysis suffers from the same issues as the previous analysis in terms of sample distribution; are the 'grey' sporocyst cells also contributing to the stem A/B (kappa) C/D (delta/phi) clusters? This is not possible to tell from the plot as the miracidia may simply be plotted on the top. A different representation of sample contribution to clusters is warranted.
We have made an alternative visualisation here to demonstrate that the miracidia cells are not plotted on top of the sporocyst stem cells. Unfortunately this visual is hampered as there is not a straightforward way to split the panels. In the figure below, the left pane shows the miracidia cells, and the right pane shows the sporocyst cells. Below that, we have included the original figure for comparison. It can be clearly seen that there are three miracidia tegument cells in the sporocyst tegument cluster, and one sporocyst cell in the miracidia stem cells (Stem E), but the miracidia A/B and C/D stem cells are not plotted on top of any sporocyst cells.
Author response image 6.
Methods: Why is the multiplet rate estimate at >50% for the unsorted sample?
We have added more detail on this: “The estimated doublet rate was calculated based on 10X loading guidelines and adjusted for our sample concentrations”.
Reviewer #2 (Recommendations For The Authors):
(1) The manuscript would benefit from a more careful consideration of what was already known based on previous literature, which would help the authors to better put their results in context. For example, previous work suggested that one of the sporocyst stem cell populations (phi) gives rise to tegument and other temporary larval structures; this appears not to be mentioned here. The model in Figure 7 suggests that two of the stem cell populations are gone at day 15 post-infection; the literature shows that those cells can still be detected at this stage (there are just far fewer of them).
We have added the definition of Kappa, Delta and Phi as per Wang et al (2018) in the stem cell results p13 ln 428.
We have amended Figure 7 to include further elements from the Wang et al (2018) paper that show that mother sporocyst stem cells classified as delta and phi are still detectable on day 15 post-infection in mother sporocysts.
We intentionally didn’t put too much emphasis on fitting our data to the model of Wang et al (2018), because a) it’s a different life cycle stage and b) the single cell data the model was based on was from 35 stem cells and gathered using a different method, c) more recent data (Diaz, Attenborough et al. 2024) with 119 stem cells from sporocysts did not recover the same populations of stem cells. We therefore linked our data to previous literature where it was relevant but focused on being led by the data we gathered (>10,000 stem cells).
(2) To add some detail to the public comment about the lack of clarity about sample sizes and biological replicates, and how this leads to questions about the robustness of the results, Figures 4 B and F show the expression pattern for the same parenchyma marker (Smp_318890) in two different samples. The patterns appear quite distinctive. In B, the cell bodies are so clearly labeled that the signal appears oversaturated. In F the cell bodies are barely apparent. Based on the single-cell clustering, it should be possible to distinguish between Parenchyma clusters 1 and 2 based on the levels of this transcript. Careful quantification of signal intensity from multiple samples across multiple experiments might enable the authors to detect such differences.
The reason the expression patterns look different between panels 4Bii and 4F is that in 4Bii we have manually segmented the nuclei of the parenchymal cells in order to count them, whereas in the images in 4F there is no segmentation. We have made this more clear in this legend now, and also in the legends of Figures 2,3, and 5. If there was any signal intensity difference between parenchyma 1 and 2 cells based on expression of the marker gene, Smp_318890, it was not obvious. We carried out 6 experiments for parenchyma markers, multiplexing the pan-parenchyma marker, Smp_318890, with markers for parenchyma 2 but we were unable to distinguish between the two populations.
(3) The authors find that the "somatic" stem cells in miracidia seem to combine attributes of the previously defined delta and phi stem cells from sporocysts. Because the 3 classes of sporocyst stem cells were defined by expression of nanos-2 and fgfrA, using those probes in in-situ experiments could have helped them resolve whether or not the miracidial cells represent precursors that can adopt either fate or if the heterogeneity is already present in miracidia.
In silico expression of the marker genes for the 3 classes of sporocyst stem cells didn’t support those three classes in the miracidia stem cells (See supplementary table 10). We further subclustered the delta/phi cells to see if we could recover separate delta and phi populations but we were unable to do so. We therefore did not pursue in situ experiments of these genes. We instead prioritised cluster-defining genes in the miracidia stem cell populations rather than cluster defining genes in the sporocyst (defined by Wang et al., 2018), but we still explored these in silico. For example, instead of using klf to define Kappa (Wang et al 2018), we used UPPA to validate the Kappa population as it showed similar expression to klf but higher expression levels and was specific to that population. However, like Wang et al 2018, we did use p53, which is a cluster marker of delta and phi in sporocysts, as it showed clear and high expression in our miracidia delta/phi population. We were guided by our data and our knowledge of the literature. More in depth single cell RNAseq is needed from the mother and daughter sporocyst stages to understand the heterogeneity and fates of these stem populations.
(4) Scale bars should be included throughout the figures and the scale should be defined either on the figure or in the legend. Similarly, all the scales used for velocity and expression analysis should be defined.
We have added scale bars to all figures and legends.
The statements “Gene expression has been log-normalised and scaled using Seurat(v. 4.3.0)”, “Gene expression has been normalised (CPM) and log-transformed using scvelo(v. 0.2.4)”, or “Library size was normalised and gene expression values were log-normalised using SAM (v1.0.1) and Scanpy (v1.8.2)” has been added to all figures as appropriate.
(5) The table entitled In situ hybridization probes (Supplementary Table 15) contains no probe sequences, so any interested reader wishing to use these probes would have to design their own. To ensure the reproducibility of the results presented here, the authors should provide the probe sequences they used.
In Supplementary Table 15 we have added the Molecular Instruments Lot number of all the probes used. Anyone wanting to repeat the experiment can order the same probes from the company.
(6) It is unclear how useful the supplemental figures showing the STRING enrichment analyses will be for readers. Unannotated Smp gene identifiers provide no way to help readers digest the information in these hairballs. It would probably be best to replace the Smp names with useful annotations based on their orthologs; if not, these figures could probably be dropped entirely. (Also, the bottom panel of Supplementary Figure 7 has the word "Lorem" embedded on one of the connecting nodes.)
“Lorem” has been removed.
Many of the genes in these analyses do not have short descriptions, therefore we have used Smp gene identifiers in the STRING analysis supplementary figures. These ‘Smp_’ numbers can be used to search WormBase Parasite, where a description can be found and the history of the gene ID traced. This latter function facilitates searching for these genes in the literature and consistency between versions as gene models are updated.
Minor edits
(1) Figures 4A-D aren't cited in the text until after 4E-F are. It seems like moving the section on protonephridial cells (line 364) before the section on tegumental cells (line 345) better reflects the order of the figures.
Thank you for flagging this, we have updated the in-text citations of Figure 4.
(2) In-text references to Sarfati et al, 2021 should be to Nanes Sarfati, as listed in the references. Poteaux et al 2023 is cited in the text, but not in the reference list.
Both of these have been fixed.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This important work provides evidence that glutamate and GABA are released from different synaptic vesicles at supramammillary axon terminals onto granule cells of the dentate gyrus. The study uses complementary electrophysiological and anatomical experimental approaches. Together, these provide solid evidence that the co-release of glutamate and GABA from different vesicles within the same terminal could modulate granule cell firing in a frequency-dependent manner, although thorough elimination of alternative mechanisms would have strengthened the study. The work will be of interest to neuroscientists investigating co-release of neurotransmitters in various synapses in the brain and those interested in subcortical control of hippocampal function.
-
Reviewer #1 (Public Review):
This study of mixed glutamate/GABA transmission from axons of the supramammillary nucleus to dentate gyrus seeks to sort out whether the two transmitters are released from the same or different synaptic vesicles. This conundrum has been examined in other dual-transmission cases and even in this particular pathway, there are different views. The authors use a variety of electrophysiological and immunohistochemical methods to reach the surprising (to me) conclusion that glutamate and GABA-filled vesicles are distinct yet released from the same nerve terminals. The strength of the conclusion rests on the abundance of data (approaches) rather than the decisiveness of any one approach, and I came away believing that the boutons may indeed produce and release distinct types of vesicles, but have reservations. Accepting the conclusion, one is now left with another conundrum, not addressed even in the discussion: how can a single bouton sort out VGLUTs and VIAATs to different vesicles, position them in distinct locations with nm precision, and recycle them without mixing? And why do it this way instead of with single vesicles having mixed chemical content? For example, could a quantitative argument be made that separate vesicles allow for higher transmitter concentrations? I feel the paper needs to address these problems with some coherent discussion, at minimum.
Major concerns:
(1) Throughout the paper, the authors use repetitive optogenetic stimulation to activate SuM fibers and co-release glutamate and GABA. There are several issues here: first, can the authors definitively assure the reader that all the short-term plasticity is presynaptic and not due to ChR2 desensitization? This has not been addressed. Second, can the authors also say that all the activated fibers release both transmitters? If for example 20% of the fibers retained a one-transmitter identity and had distinct physiological properties, could that account for some of the physiological findings?
(2) PPR differences in Figures 1F-I are statistically significant but still quite small. You could say they are more similar than different in fact, and residual differences are accounted for by secondary factors like differential receptor saturation.
(3) The logic of the GPCR experiments needs a better setup. I could imagine different fibers released different transmitters and had different numbers of mGluRs, so that one would get different modulations. On the assumption that all the release is from a single population of boutons, then either the mGluRs are differentially segregated within the bouton, or the vesicles have differential responsiveness to the same modulatory signal (presumably a reduced Ca current). This is not developed in the paper.
(4) The biphasic events of Figures 3 and S3: I find these (unaveraged) events a bit ambiguous. Another way to look at them is that they are not biphasic per se but rather are not categorizable. Moreover, these events are really tiny, perhaps generated by only a few receptors whose open probability is variable, thus introducing noise into the small currents.
(5) Figure 4 indicates that the immunohistochemical analysis is done on SuM terminals, but I do not see how the authors know that these terminals come from SuM vs other inputs that converge in DG.
(6) Figure 4E also shows many GluN1 terminals not associated with anything, not even Vglut, and the apparent numbers do not mesh with the statistics. Why?
(7) Do the conclusions based on the fluorescence immuno mesh with the apparent dimensions of the EM active zones and the apparent intermixing of labeled vesicles in immuno EM?
(8) Figure 6 is not so interesting to me and could be removed. It seems to test the obvious: EPSPs promote firing and IPSPs oppose it.
-
Reviewer #2 (Public Review):
Summary:
In this study, the authors investigated the release properties of glutamate/GABA co-transmission at the supramammillary nucleus (SuM)-granule cell (GC) synapses using in vitro electrophysiology and anatomical approaches at the light and electron microscopy level. They found that SuM to dentate granule cell synapses, which co-release glutamate and GABA, exhibit distinct differences in paired-pulse ratio, Ca2+ sensitivity, presynaptic receptor modulation, and Ca2+ channel-vesicle coupling configuration for each neurotransmitter. The study shows that glutamate/GABA co-release produces independent glutamatergic and GABAergic synaptic responses, with postsynaptic targets segregated. They show that most SuM boutons form distinct glutamatergic and GABAergic synapses in close proximity, characterized by GluN1 and GABAAα1 receptor labeling, respectively. Furthermore, they demonstrate that glutamate/GABA co-transmission exhibits distinct short-term plasticity, with glutamate showing frequency-dependent depression and GABA showing frequency-independent stable depression.
Their findings suggest that these distinct modes of glutamate/GABA co-release by SuM terminals serve as frequency-dependent filters of SuM inputs.
Strengths:
The conclusions of this paper are mostly well supported by the data.
Weaknesses:
Some aspects of Supplementary Figure 1A and the table need clarification. Specifically, the claim that the authors have stimulated an axon fiber rather than axon terminals is not convincingly supported by the diagram of the experimental setup. Additionally, the antibody listed in the primary antibodies section recognizes the gamma2 subunit of the GABAA receptor, not the alpha1 subunit mentioned in the results and Figure 4.
-
Reviewer #3 (Public Review):
Summary:
In this manuscript, Hirai et al investigated the release properties of glutamate/GABA co-transmission at SuM-GC synapses and reported that glutamate/GABA co-transmission exhibits distinct short-term plasticity with segregated postsynaptic targets. Using optogenetics, whole-cell patch-clamp recordings, and immunohistochemistry, the authors reveal distinct transmission modes of glutamate/GABA co-release as frequency-dependent filters of incoming SuM inputs.
Strengths:
Overall, this study is well-designed and executed; conclusions are supported by the results. This study addressed a long-standing question of whether GABA and glutamate are packaged in the same vesicles and co-released in response to the same stimuli in the SuM-GC synapses (Pedersen et al., 2017; Hashimotodani et al., 2018; Billwiller et al., 2020; Chen et al., 2020; Li et al., 2020; Ajibola et al., 2021). Knowledge gained from this study advances our understanding of neurotransmitter co-release mechanisms and their functional roles in the hippocampal circuits.
Weaknesses:
No major issues are noted. Some minor issues related to data presentation and experimental details are listed below.
-
-
-
eLife assessment
This study provides a novel and valuable alternative explanation for volatility-induced changes in choice behavior, commonly attributed to learning-rate adaptations. Through rigorous and comprehensive computational modeling of previously published data, the authors provide convincing support for the claim that apparent learning-rate adaptations may instead reflect a mixture of decision strategies. Furthermore, they demonstrate that differential weighting of the optimal decision strategy is predicted by psychopathology common to depression and anxiety. This work should be of interest to a wide range of scientists, including psychologists, neuroscientists, computer scientists, and clinicians.
-
Reviewer #2 (Public Review):
Summary:
Previous research shows that humans tend to adjust learning in environments where stimulus-outcome contingencies become more volatile. This learning rate adaptation is impaired in some psychiatric disorders, such as depression and anxiety. In this tudy the authors reanalyze previously published data on a reversal learning task with two volatility levels. Through a new model they provide some evidence for an alternative explanation whereby the learning rate adaptation is driven by different decision-making strategies and not learning deficits. In particular, they propose that adjusting of learning can be explained by deviations from the optimal decision-making strategy (based on maximizing expected utility) due to response stickiness or focus on reward magnitude. Furthermore, a factor related to general psychopathology of individuals with anxiety and depression negatively correlated with the weight on the optimal strategy and response stickiness, while it correlated positively with the magnitude strategy (a strategy that ignores the probability of outcome).
The main strength of the study is a novel and interesting explanation of an otherwise well-established finding in human reinforcement learning. This proposal is supported by rigorously conducted parameter retrieval and the comparison of the novel model to a wide range of previously published models. The authors explore from many angles, if and why the predictions from the new proposed model are superior to previously applied models.
My previous concerns were addressed in the revised version of the manuscript. I believe that the article now provides a new perspective on a well-established learning effect and offer a novel set of interesting response models that can be applied to a wide array of decision-making problems.
I see two limitations of the study not mentioned in the discussion of the manuscript. First, the task features binary inputs and responses, therefore unexpected uncertainty (volatility) is impossible to differentiate from the uncertainty about outcomes, and exploration is inseparable from random choices. Future work could validate these findings in task designs that allow to distinguish these processes. Second, clinical results are based on a small sample of patients and should be interpreted with this in mind.
-
Reviewer #3 (Public Review):
Summary:
This paper presents a new formulation of a computational model of adaptive learning amid environmental volatility. Using a behavioral paradigm and data set made available by the authors of an earlier publication (Gagne et al., 2020), the new model is found to fit the data well. The model's structure consists of three weighted controllers that influence decisions on the basis of (1) expected utility, (2) potential outcome magnitude, and (3) habit. The model offers an interpretation of psychopathology-related individual differences in decision-making behavior in terms of differences in the relative weighting of the three controllers.
Strengths:
The newly proposed "mixture of strategies" (MOS) model is evaluated relative to the model presented in the original paper by Gagne et al., 2020 (here called the "flexible learning rate" or FLR model) and two other models. Appropriate and sophisticated methods are used for developing, parameterizing, fitting, and assessing the MOS model, and the MOS model performs well on multiple goodness-of-fit indices. Parameters of the model show decent recoverability and offer a novel interpretation for psychopathology-related individual differences. Most remarkably, the model seems to be able to account for apparent differences in behavioral learning rates between high-volatility and low-volatility conditions even with no true condition-dependent change in the parameters of its learning/decision processes. This finding calls into question a class of existing models that attribute behavioral adaptation to adaptive learning rates.
Weaknesses:
The authors have responded to the weaknesses noted previously.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1:
Point 1.1
Summary: This paper describes a reanalysis of data collected by Gagne et al. (2020), who investigated how human choice behaviour differs in response to changes in environmental volatility. Several studies to date have demonstrated that individuals appear to increase their learning rate in response to greater volatility and that this adjustment is reduced amongst individuals with anxiety and depression. The present authors challenge this view and instead describe a novel Mixture of Strategies (MOS) model, that attributes individual differences in choice behaviour to different weightings of three distinct decision-making strategies. They demonstrate that the MOS model provides a superior fit to the data and that the previously observed differences between patients and healthy controls may be explained by patients opting for a less cognitively demanding, but suboptimal, strategy.
Strengths:
The authors compare several models (including the original winning model in Gagne et al., 2020) that could feasibly fit the data. These are clearly described and are evaluated using a range of model diagnostics. The proposed MOS model appears to provide a superior fit across several tests.
The MOS model output is easy to interpret and has good face validity. This allows for the generation of clear, testable, hypotheses, and the authors have suggested several lines of potential research based on this.
We appreciate the efforts in understanding our manuscript. This is a good summary.
Point 1.2
The authors justify this reanalysis by arguing that learning rate adjustment (which has previously been used to explain choice behaviour on volatility tasks) is likely to be too computationally expensive and therefore unfeasible. It is unclear how to determine how "expensive" learning rate adjustment is, and how this compares to the proposed MOS model (which also includes learning rate parameters), which combines estimates across three distinct decision-making strategies.
We are sorry for this confusion. Actually, our motivation is that previous models only consider the possibility of learning rate adaptation to different levels of environmental volatility. The drawback of previous computational modeling is that they require a large number of parameters in multi-context experiments. We feel that learning rate adaptation may not be the only mechanisms or at least there may exist alternative explanations. Understanding the true mechanisms is particularly important for rehabilitation purposes especially in our case of anxiety and depression. To clarify, we have removed all claims about the learning rate adaptation is “too complex to understand”.
Point 1.3
As highlighted by the authors, the model is limited in its explanation of previously observed learning differences based on outcome value. It's currently unclear why there would be a change in learning across positive/negative outcome contexts, based on strategy choice alone.
Thanks for mentioning this limitation. We want to highlight two aspect of work.
First, we developed the MOS6 model primarily to account for the learning rate differences between stable and volatile contexts, and between healthy controls and patients, not for between positive and negative outcomes. In the other words, our model does not eliminate the possibility of different learning rate in positive and negative outcomes.
Second, Figure 3A shows that FLR (containing different learning parameters for positive/negative outcomes) even performed worse than MOS6 (setting identical learning rate for positive/negative outcomes). This result question whether learning rate differences between positive/negative outcomes exist in our dataset.
Action: We now include this limitation in lines 784-793 in discussion:
“The MOS model is developed to offer context-free interpretations for the learning rate differences observed both between stable and volatile contexts and between healthy individuals and patients. However, we also recognize that the MOS account may not justify other learning rate effects based solely on strategy preferences. One such example is the valence-specific learning rate differences, where learning rates for better-than-expected outcomes are higher than those for worse-than-expected outcomes (Gagne et al., 2020). When fitted to the behavioral data, the context-dependent MOS22 model does not reveal valence-specific learning rates (Supplemental Note 4). Moreover, the valence-specific effect was not replicated in the FLR22 model when fitted to the synthesized data of MOS6.”
Point 1.4
Overall the methods are clearly presented and easy to follow, but lack clarity regarding some key features of the reversal learning task.
Throughout the method the stimuli are referred to as "right" and "left". It's not uncommon in reversal learning tasks for the stimuli to change sides on a trial-by-trial basis or counterbalanced across stable/volatile blocks and participants. It is not stated in the methods whether the shapes were indeed kept on the same side throughout. If this is the case, please state it. If it was not (and the shapes did change sides throughout the task) this may have important implications for the interpretation of the results. In particular, the weighting of the habitual strategy (within the Mixture of Strategies model) could be very noisy, as participants could potentially have been habitual in choosing the same side (i.e., performing the same motor movement), or in choosing the same shape. Does the MOS model account for this?
We are sorry for the confusion. Yes, two shapes indeed changed sides throughout the task. We replaced the “left” and “right” with “stimulus 1” and “stimulus 2”. We also acknowledge the possibility that participants may develop a habitual preference for a particular side, rather than a shape. Due to the counterbalance design, habitual on side will introduce a random selection noise in choices, which should be captured by the MOS model through the inverse temperature parameter.
Point 1.5
Line 164: "Participants received points or money in the reward condition and an electric shock in the punishment condition." What determined whether participants received points or money, and did this differ across participants?
Thanks! We have the design clarified in lines 187-188:
“Each participant was instructed to complete two blocks of the volatile reversal learning task, one in the reward context and the other in the aversive context”,
and in lines:
“A total of 79 participants completed tasks in both feedback contexts. Four participants only completed the task in the reward context, while three participants only completed the aversive task.”
Point 1.6
Line 167: "The participant received feedback only after choosing the correct stimulus and received nothing else" Is this correct? In Figure 1a it appears the participant receives feedback irrespective of the stimulus they chose, by either being shown the amount 1-99 they are being rewarded/shocked, or 0. Additionally, what does the "correct stimulus" refer to across the two feedback conditions? It seems intuitive that in the reward version, the correct answer would be the rewarding stimulus - in the loss version is the "correct" answer the one where they are not receiving a shock?
Thanks for raising this issue. We removed the term “correct stimulus” and revised the lines 162-166 accordingly:
“Only one of the two stimuli was associated with actual feedback (0 for the other one). The feedback magnitude, ranged between 1-99, is sampled uniformly and independently for each shape from trial to trial. Actual feedback was delivered only if the stimulus associated with feedback was chosen; otherwise, a number “0” was displayed on the screen, signifying that the chosen stimulus returns nothing.”
Point 1.7
Line 176: "The whole experiment included two runs each for the two feedback conditions." Does this mean participants completed the stable and volatile blocks twice, for each feedback condition? (i.e., 8 blocks total, 4 per feedback condition).
Thanks! We have removed the term “block”, and now we refer to it as “context”. In particular, we removed phrases like “stable block” and “volatile block” and used “context” instead.
Action: See lines 187-189 for the revised version.
“Each participant was instructed to complete two runs of the volatile reversal learning task, one in the reward context and the other in the aversive context. Each run consisted of 180 trials, with 90 trials in the stable context and 90 in the volatile context (Fig. 1B).”
Point 1.8
In the expected utility (EU) strategy of the Mixture or Strategies model, the expected value of the stimulus on each trial is produced by multiplying the magnitude and probability of reward/shock. In Gagne et al.'s original paper, they found that an additive mixture of these components better-captured participant choice behaviour - why did the authors not opt for the same strategy here?
Thanks for asking this. Their strategy basic means the mixture of PF+MO+HA, where PF stands for the feedback probability (e.g., 0.3 or 0.7) without multiplying feedback magnitude. However, ours are EU+MO+HA, where EU stands for feedback probability x feedback magnitude. We did compare these two strategies and the model using their strategy performed much worse than ours (see the red box below).
Author response image 1.
Thorough model comparison.
Point 1.9
How did the authors account for individuals with poor/inattentive responding, my concern is that the habitual strategy may be capturing participants who did not adhere to the task (or is this impossible to differentiate?).
The current MOS6 model distinguishes between the HA strategy and the inattentive response. Due to the counterbalance design, the HA strategy requires participants to actively track the stimuli on the screen. In contrast, the inattentive responding, like the same motor movement mentioned in Point 1.4, should exhibit random selection in their behavioral data, which should be account by the inverse temperature parameter.
Point 1.10
The authors provide a clear rationale for, and description of, each of the computational models used to capture participant choice behaviour.
• Did the authors compare different combinations of strategies within the MOS model (e.g., only including one or two strategies at a time, and comparing fit?) I think more explanation is needed as to why the authors opted for those three specific strategies.
We appreciate this great advice. Following your advice, we conducted a thorough model comparisons. Please refer to Figure R1 above. The detailed text descriptions of all the models in Figure R1 are included in Supplemental Note 1.
Point 1.11
Please report the mean and variability of each of the strategy weights, per group.
Thanks. We updated the mean of variability of the strategies in lines 490-503:
“We first focused on the fitted parameters of the MOS6 model. We compared the weight parameters (, , ) across groups and conducted statistical tests on their logits (, , ). The patient group showed a ~37% preference towards the EU strategy, which is significantly weaker than the ~50% preference in healthy controls (healthy controls’ : M = 0.991, SD = 1.416; patients’ : M = 0.196, SD = 1.736; t(54.948) = 2.162, p = 0.035, Cohen’s d = 0.509; Fig. 4A). Meanwhile, the patients exhibited a weaker preference (~27%) for the HA strategy compared to healthy controls (~36%) (healthy controls’ : M = 0.657, SD = 1.313; patients’ : M = -0.162, SD = 1.561; t(56.311) = 2.455, p = 0.017, Cohen’s d = 0.574), but a stronger preference for the MO strategy (36% vs. 14%; healthy controls’ : M = -1.647, SD = 1.930; patients’ : M = -0.034, SD = 2.091; t(63.746) = -3.510, p = 0.001, Cohen’s d = 0.801). Most importantly, we also examined the learning rate parameter in the MOS6 but found no group differences (t(68.692) = 0.690, p = 0.493, Cohen’s d = 0.151). These results strongly suggest that the differences in decision strategy preferences can account for the learning behaviors in the two groups without necessitating any differences in learning rate per se.”
Point 1.12
The authors compare the strategy weights of patients and controls and conclude that patients favour more simpler strategies (see Line 417), based on the fact that they had higher weights for the MO, and lower on the EU.
(1) However, the finding that control participants were more likely to use the habitual strategy was largely ignored. Within the control group, were the participants significantly more likely to opt for the EU strategy, over the HA? 2) Further, on line 467 the authors state "Additionally, there was a significant correlation between symptom severity and the preference for the HA strategy (Pearson's r = -0.285, p = 0.007)." Apologies if I'm mistaken, but does this negative correlation not mean that the greater the symptoms, the less likely they were to use the habitual strategy?
I think more nuance is needed in the interpretation of these results, particularly in the discussion.
Thanks. The healthy participants seemed more likely to opt for the EU strategy, although this difference did not reach significance (paired-t(53) = 1.258, p = 0.214, Cohen’s d = 0.242). We systematically explore the role of HA. Compared to the MO, the HA saves cognitive resources but yields a significantly higher hit rate (Fig. 4A). Therefore, a preference for the HA over the MO strategy may reflect a more sophisticated balance between reward and complexity within an agent: when healthier subjects run out of cognitive resources for the EU strategy, they will cleverly resort to the HA strategy, adopting a simpler strategy but still achieving a certain level of hit rate. This explains the negative symptom-HA correlation. As clever as the HA strategy is, it is not surprising that the health control participants opt more for the HA during decision-making.
However, we are cautious to draw strong conclusion on (1) non-significant difference between EU and HA within health controls and (2) the negative symptom-HA correlation. The reason is that the MOS22, the context-dependent variant, 1) exhibited a significant higher preference for EU over HA (paired-t(53) = 4.070, p < 0.001, Cohen’s d = 0.825) and 2) did not replicate this negative correlation (Supplemental Information Figure S3).
Action: Simulation analysis on the effects of HA was introduced in lines 556-595 and Figure 4. We discussed the effects of HA in lines 721-733:
“Although many observed behavioral differences can be explained by a shift in preference from the EU to the MO strategy among patients, we also explore the potential effects of the HA strategy. Compared to the MO, the HA strategy also saves cognitive resources but yields a significantly higher hit rate (Fig. 4A). Therefore, a preference for the HA over the MO strategy may reflect a more sophisticated balance between reward and complexity within an agent (Gershman, 2020): when healthier participants exhaust their cognitive resources for the EU strategy, they may cleverly resort to the HA strategy, adopting a simpler strategy but still achieving a certain level of hit rate. This explains the stronger preference for the HA strategy in the HC group (Fig. 3A) and the negative correlation between HA preferences and symptom severity (Fig. 5). Apart from shedding light on the cognitive impairments of patients, the inclusion of the HA strategy significantly enhances the model’s fit to human behavior (see examples in Daw et al. (2011); Gershman (2020); and also Supplemental Note 1 and Supplemental Figure S3).”
Point 1.13
Line 513: "their preference for the slowest decision strategy" - why is the MO considered the slowest strategy? Is it not the least cognitively demanding, and therefore, the quickest?
Sorry for the confusion. In Fig. 5C, we conducted simulations to estimate the learning speed for each strategy. As shown below, the MO strategy exhibits a flat learning curve. Our claim on the learning speed was based solely on simulation outcomes without referring to cognitive demands. Note that our analysis did not aim to compare the cognitive demands of the MO and HA strategies directly.
Action: We explain the learning speed of the three strategies in lines 571-581.
Point 1.14
The authors argue that participants chose suboptimal strategies, but do not actually report task performance. How does strategy choice relate to the performance on the task (in terms of number of rewards/shocks)? Did healthy controls actually perform any better than the patient group?
Thanks for the suggestion. The answers are: 1) EU is the most rewarding > the HA > the MO (Fig. 5A), and 2) yes healthy controls did actually perform better than patients in terms of hit rate (Fig. 2).
Action: We included additional sections on above analyses in lines 561-570 and lines 397-401.
Point 1.15
The authors speculate that Gagne et al. (2020) did not study the relationship between the decision process and anxiety and depression, because it was too complex to analyse. It's unclear why the FLR model would be too complex to analyse. My understanding is that the focus of Gagne's paper was on learning rate (rather than noise or risk preference) due to this being the main previous finding.
Thanks! Yes, our previous arguments are vague and confusing. We have removed all this kind of arguments.
Point 1.16
Minor Comments:
• Line 392: Modeling fitting > Model fitting
• Line 580 reads "The MO and HA are simpler heuristic strategies that are cognitively demanding."
- should this read as less cognitively demanding?
• Line 517: health > healthy
• Line 816: Desnity > density
Sorry for the typo! They have all been fixed.
Reviewer #2:
Point 2.1
Summary: Previous research shows that humans tend to adjust learning in environments where stimulus-outcome contingencies become more volatile. This learning rate adaptation is impaired in some psychiatric disorders, such as depression and anxiety. In this study, the authors reanalyze previously published data on a reversal-learning task with two volatility levels. Through a new model, they provide some evidence for an alternative explanation whereby the learning rate adaptation is driven by different decision-making strategies and not learning deficits. In particular, they propose that adjusting learning can be explained by deviations from the optimal decision-making strategy (based on maximizing expected utility) due to response stickiness or focus on reward magnitude. Furthermore, a factor related to the general psychopathology of individuals with anxiety and depression negatively correlated with the weight on the optimal strategy and response stickiness, while it correlated positively with the magnitude strategy (a strategy that ignores the probability of outcome).
Thanks for evaluating our paper. This is a good summary.
Point 2.2
My main concern is that the winning model (MOS6) does not have an error term (inverse temperature parameter beta is fixed to 8.804).
(1) It is not clear why the beta is not estimated and how were the values presented here chosen. It is reported as being an average value but it is not clear from which parameter estimation. Furthermore, with an average value for participants that would have lower values of inverse temperature (more stochastic behaviour) the model is likely overfitting.
(2) In the absence of a noise parameter, the model will have to classify behaviour that is not explained by the optimal strategy (where participants simply did not pay attention or were not motivated) as being due to one of the other two strategies.
We apologize for any confusion caused by our writing. We did set the inverse temperature as a free parameter and quantitatively estimate it during the model fitting and comparison. We also created a table to show the free parameters for each models. In the previous manuscript, we did mention “temperature parameter beta is fixed to 8.804”, but only for the model simulation part, which is conducted to interpret some model behaviors.
We agree with the concern that using the averaged value over the inverse temperature could lead to overfitting to more stochastic behaviors. To mitigate this issue, we now used the median as a more representative value for the population during simulation. Nonetheless, this change does not affect our conclusion (see simulation results in Figures 4&6).
Action: We now use the term “free parameter” to emphasize that the inverse temperature was fitted rather than fixed. We also create a new table “Table 1” in line 458 to show all the free parameters within a model. We also update the simulation details in lines 363-391 for more clarifications.
Point 2.3
(3) A model comparison among models with inverse temperature and variable subsets of the three strategies (EU + MO, EU + HA) would be interesting to see. Similarly, comparison of the MOS6 model to other models where the inverse temperature parameter is fixed to 8.804).
This is an important limitation because the same simulation as with the MOS model in Figure 3b can be achieved by a more parsimonious (but less interesting) manipulation of the inverse temperature parameter.
Thanks, we added a comparison between the MOS6 and the two lesion models (EU + MO, EU + HA). Please refer to the figure below and Point 1.8.
We also realize that the MO strategy could exhibit averaged learning curves similar to random selection. To confirm that patients' slower learning rates are due to a preference for the MO strategy, we compared the MOS6 model with a variant (see the red box below) in which the MO strategy is replaced by Random (RD) selection that assigns a 0.5 probability to both choices. This comparison showed that the original MOS6 model with the MO strategy better fits human data.
Author response image 2.
Point 2.4
Furthermore, the claim that the EU represents an optimal strategy is a bit overstated. The EU strategy is the only one of the three that assumes participants learn about the stimulus-outcomes contingencies. Higher EU strategy utilisation will include participants that are more optimal (in maximum utility maximisation terms), but also those that just learned better and completely ignored the reward magnitude.
Thank you for your feedback. We have now revised the paper to remove all statement about “EU strategy is the optimal” and replaced by “EU strategy is rewarding but complex”. We agree that both the EU strategy and the strategy only focusing on feedback probability (i.e., ignoring the reward magnitude, refer to as the PF strategy) are rewarding but complex beyond two simple heuristics. We also included the later strategy in our model comparisons (see the next section Point 2.5).
Point 2.5
The mixture strategies model is an interesting proposal, but seems to be a very convoluted way to ask: to what degree are decisions of subjects affected by reward, what they've learned, and response stickiness? It seems to me that the same set of questions could be addressed with a simpler model that would define choice decisions through a softmax with a linear combination of the difference in rewards, the difference in probabilities, and a stickiness parameter.
Thanks for suggesting this model. We did include the proposed linear combination models (see “linear comb.” in the red box below) and found that it performed significantly worse than the MOS6.
Action: We justified our model selection criterion in the Supplemental Note 1.
Author response image 3.
Point 2.6
Learning rate adaptation was also shown with tasks where decision-making strategies play a less important role, such as the Predictive Inference task (see for instance Nassar et al, 2010). When discussing the merit of the findings of this study on learning rate adaptation across volatility blocks, this work would be essential to mention.
Thanks for mentioning this great experimental paradigm, which provides an ideal solution for disassociating the probability learning and decision process. We have discussed about this paradigm as well as the associated papers in discussion lines 749-751, 763-765, and 796-801.
Point 2.7
Minor mistakes that I've noticed:
Equation 6: The learning rate for response stickiness is sometimes defined as alpha_AH or alpha_pi.
Supplementary material (SM) Contents are lacking in Note1. SM talks about model MOS18, but it is not defined in the text (I am assuming it is MOS22 that should be talked about here).
Thanks! Fixed.
Reviewer #3:
Point 3.1
Summary: This paper presents a new formulation of a computational model of adaptive learning amid environmental volatility. Using a behavioral paradigm and data set made available by the authors of an earlier publication (Gagne et al., 2020), the new model is found to fit the data well. The model's structure consists of three weighted controllers that influence decisions on the basis of (1) expected utility, (2) potential outcome magnitude, and (3) habit. The model offers an interpretation of psychopathology-related individual differences in decision-making behavior in terms of differences in the relative weighting of the three controllers.
Strengths: The newly proposed "mixture of strategies" (MOS) model is evaluated relative to the model presented in the original paper by Gagne et al., 2020 (here called the "flexible learning rate" or FLR model) and two other models. Appropriate and sophisticated methods are used for developing, parameterizing, fitting, and assessing the MOS model, and the MOS model performs well on multiple goodness-of-fit indices. The parameters of the model show decent recoverability and offer a novel interpretation for psychopathology-related individual differences. Most remarkably, the model seems to be able to account for apparent differences in behavioral learning rates between high-volatility and low-volatility conditions even with no true condition-dependent change in the parameters of its learning/decision processes. This finding calls into question a class of existing models that attribute behavioral adaptation to adaptive learning rates.
Thanks for evaluating our paper. This is a good summary.
Point 3.2<br /> (1) Some aspects of the paper, especially in the methods section, lacked clarity or seemed to assume context that had not been presented. I found it necessary to set the paper down and read Gagne et al., 2020 in order to understand it properly.
(3) Clarification-related suggestions for the methods section: <br /> - Explain earlier that there are 4 contexts (reward/shock crossed with high/low volatility). Lines 252-307 contain a number of references to parameters being fit separately per context, but "context" was previously used only to refer to the two volatility levels.
Action: We have placed the explanation as well as the table about the 4 contexts (stable-reward/stable-aversive/volatile-reward/volatile-aversive) earlier in the section that introduces the experiment paradigm (lines 177-186):
“Participants was supposed to complete this learning and decision-making task in four experimental contexts (Fig. 1A), two feedback contexts (reward or aversive) two volatility contexts (stable or volatile). Participants received points in the reward context and an electric shock in the aversive context. The reward points in the reward context were converted into a monetary bonus by the end of the task, ranging from £0 to £10. In the stable context, the dominant stimulus (i.e., a certain stimulus induces the feedback with a higher probability) provided a feedback with a fixed probability of 0.75, while the other one yielded a feedback with a probability of 0.25. In the volatile context, the dominant stimulus’s feedback probability was 0.8, but the dominant stimulus switched between the two every 20 trials. Hence, this design required participants to actively learn and infer the changing stimulus-feedback contingency in the volatile context.”
- It would be helpful to provide an initial outline of the four models that will be described since the FLR, RS, and PH models were not foreshadowed in the introduction. For the FLR model in particular, it would be helpful to give a narrative overview of the components of the model before presenting the notation.
Action: We now include an overview paragraph in the section of computation model to outline the four models as well as the hypotheses constituted in the model (lines 202-220).
- The subsection on line 343, describing the simulations, lacks context. There are references to three effects being simulated (and to "the remaining two effects") but these are unclear because there's no statement in this section of what the three effects are.
- Lines 352-353 give group-specific weighting parameters used for the stimulations of the HC and PAT groups in Figure 4B. A third, non-group-specific set of weighting parameters is given above on lines 348-349. What were those used for?
- Line 352 seems to say Figure 4A is plotting a simulation, but the figure caption seems to say it is plotting empirical data.
These paragraphs has been rewritten and the abovementioned issues have been clarified. See lines 363-392.
Point 3.2
(2) There is little examination of why the MOS model does so well in terms of model fit indices. What features of the data is it doing a better job of capturing? One thing that makes this puzzling is that the MOS and FLR models seem to have most of the same qualitative components: the FLR model has parameters for additive weighting of magnitude relative to probability (akin to the MOS model's magnitude-only strategy weight) and for an autocorrelative choice kernel (akin to the MOS model's habit strategy weight). So it's not self-evident where the MOS model's advantage is coming from.
An intuitive understanding of the FLR model is that it estimates the stimuli value through a linear combination of probability feedback (PF, )and (non-linear) magnitude .See equation:
Also, the FLR model include the mechanisms of HA as:
In other words, FLR model considers the mechanisms about the probability of feedback (PF)+MO+HA (see Eq. XX in the original study), but our MOS considers the mechanisms of EU+MO+HA. The key qualitative difference lies between FLR and MOS is the usage of the expected utility formula (EU) instead the probability of feedback (PF). The advantage of our MOS model has been fully evidenced by our model comparisons, indicating that human participants multiply probability and magnitude rather than only considering probability. The EU strategy has also been suggested by a large pile of literature (Gershman et al., 2015; Von Neumann & Morgenstern, 1947).
Making decisions based on the multiplication of feedback probability and magnitude can often yield very different results compared to decisions based on a linear combination of the two, especially when the two magnitudes have a small absolute difference but a large ratio. Let’s consider two cases:
(1) Stimulus 1: vs. Stimulus 2:
(2) Stimulus 1: vs. Stimulus 2:
The EU strategy may opt for stimulus 2 in both cases, since stimulus 2 always has a larger expected value. However, it is very likely for the PF+MO to choose stimulus 1 in the first case. For example, when . If we want the PF+MO to also choose stimulus to align with the EU strategy, we need to increase the weight on magnitude . Note that in this example we divided the magnitude value by 100 to ensure that probability and magnitude are on the same scale to help illustration.
In the dataset reported by Gagne, 2020, the described scenario seems to occur more often in the aversive context than in the reward context. To accurately capture human behaviors, FLR22 model requires a significantly larger weight for magnitude in the aversive context than in the reward context . Interestingly, when the weights for magnitude in different contexts are forced to be equal, the model (FLR6) fails, exhibiting an almost chance-level performance throughout learning (Fig. 3E, G). In contrast, the MOS6 model, and even the RS3 model, exhibit good performance using one identical set of parameters across contexts. Both MOS6 and RS3 include the EU strategy during decision-making. These findings suggest humans make decisions using the EU strategy rather than PF+MO.
The focus of our paper is to present that a good-enough model can interpret the same dataset in a completely different perspective, not necessarily to explore improvements for the FLR model.
Point 3.3
One of the paper's potentially most noteworthy findings (Figure 5) is that when the FLR model is fit to synthetic data generated by the expected utility (EU) controller with a fixed learning rate, it recovers a spurious difference in learning rate between the volatile and stable environments. Although this is potentially a significant finding, its interpretation seems uncertain for several reasons:
- According to the relevant methods text, the result is based on a simulation of only 5 task blocks for each strategy. It would be better to repeat the simulation and recovery multiple times so that a confidence interval or error bar can be estimated and added to the figure.
- It makes sense that learning rates recovered for the magnitude-oriented (MO) strategy are near zero, since behavior simulated by that strategy would have no reason to show any evidence of learning. But this makes it perplexing why the MO learning rate in the volatile condition is slightly positive and slightly greater than in the stable condition.
- The pure-EU and pure-MO strategies are interpreted as being analogous to the healthy control group and the patient group, respectively. However, the actual difference in estimated EU/MO weighting between the two participant groups was much more moderate. It's unclear whether the same result would be obtained for a more empirically plausible difference in EU/MO weighting.
- The fits of the FLR model to the simulated data "controlled all parameters except for the learning rate parameters across the two strategies" (line 522). If this means that no parameters except learning rate were allowed to differ between the fits to the pure-EU and pure-MO synthetic data sets, the models would have been prevented from fitting the difference in terms of the relative weighting of probability and magnitude, which better corresponds to the true difference between the two strategies. This could have interfered with the estimation of other parameters, such as learning rate.
- If, after addressing all of the above, the FLR model really does recover a spurious difference in learning rate between stable and volatile blocks, it would be worth more examination of why this is happening. For example, is it because there are more opportunities to observe learning in those blocks?
I would recommend performing a version of the Figure 5 simulations using two sets of MOS-model parameters that are identical except that they use healthy-control-like and patient-like values of the EU and MO weights (similar to the parameters described on lines 346-353, though perhaps with the habit controller weight equated). Then fit the simulated data with the FLR model, with learning rate and other parameters free to differ between groups. The result would be informative as to (1) whether the FLR model still misidentifies between-group strategy differences as learning rate differences, and (2) whether the FLR model still identifies spurious learning rate differences between stable and volatile conditions in the control-like group, which become attenuated in the patient-like group.
Many thanks for this great advice. Following your suggestions, we now conduct simulations using the median of the fitted parameters. The representations for healthy controls and patients have identical parameters, except for the three preference parameters; moreover, the habit weights are not controlled to be equal. 20 simulations for each representative, each comprising 4 task sequences sampled from the behavioral data. In this case, we could create error bars and perform statistical tests. We found that the differences in learning rates between stable and volatile conditions, as well as the learning rate adaptation differences between healthy controls and patients, still persisted.
Combined with the discussion in Point 3.2, we justify why a mixture-of-strategy can account for learning rate adaptation as follow. Due to (unknown) differences in task sequences, the MOS6 model exhibits more MO-like behaviors due to the usage of the EU strategy. To capture this behavior pattern, the FLR22 model has to increase its weighting parameter 1-λ for magnitude, which could ultimately drive the FLR22 to adjust the fitted learning rate parameters, exhibiting a learning rate adaptation effect. Our simulations suggest that estimating learning rate just by model fitting may not be the only way to interpret the data.
Action: We included the simulation details in the method section (lines 381-lines 391)
“In one simulated experiment, we sampled the four task sequences from the real data. We simulated 20 experiments with the parameters of to mimic the behavior of the healthy control participants. The first three are the median of the fitted parameters across all participants; the latter three were chosen to approximate the strategy preferences of real health control participants (Figure 4A). Similarly, we also simulated 20 experiments for the patient group with the identical values of , and , but different strategy preferences . In other words, the only difference in the parameters of the two groups is the switched and . We then fitted the FLR22 to the behavioral data generated by the MOS6 and examined the learning rate differences across groups and volatile contexts (Fig. 6). ”
Point 3.4
Figure 4C shows that the habit-only strategy is able to learn and adapt to changing contingencies, and some of the interpretive discussion emphasizes this. (For instance, line 651 says the habit strategy brings more rewards than the MO strategy.) However, the habit strategy doesn't seem to have any mechanism for learning from outcome feedback. It seems unlikely it would perform better than chance if it were the sole driver of behavior. Is it succeeding in this example because it is learning from previous decisions made by the EU strategy, or perhaps from decisions in the empirical data?
Yes, the intuition is that the HA strategy seems to show no learning mechanism. But in reality, it yields a higher hit rate than MO by simply learning from previous decisions made by the EU strategy. We run simulations to confirm this (Figure 4B).
Point 3.5
For the model recovery analysis (line 567), the stated purpose is to rule out the possibility that the MOS model always wins (line 552), but the only result presented is one in which the MOS model wins. To assess whether the MOS and FLR models can be differentiated, it seems necessary also to show model recovery results for synthetic data generated by the FLR model.
Sure, we conducted a model recovery analysis that include all models, and it demonstrates that MOS and FLR can be fully differentiated. The results of the new model recovery analysis were shown in Fig. 7.
Point 3.6
To the best of my understanding, the MOS model seems to implement valence-specific learning rates in a qualitatively different way from how they were implemented in Gagne et al., 2020, and other previous literature. Line 246 says there were separate learning rates for upward and downward updates to the outcome probability. That's different from using two learning rates for "better"- and "worse"-than-expected outcomes, which will depend on both the direction of the update and the valence of the outcome (reward or shock). Might this relate to why no evidence for valence-specific learning rates was found even though the original authors found such evidence in the same data set?
Thanks. Following the suggestion, we have corrected our implementation of valence-specific learning rate in all models (see lines 261-268).
“To keep consistent with Gagne et al., (2020), we also explored the valence-specific learning rate,
is the learning rate for better-than-expected outcome, and for worse-than-expected outcome. It is important to note that Eq. 6 was only applied to the reward context, and the definitions of “better-than-expected” and “worse-than-expected” should change accordingly in the aversive context, where we defined for and for .
No main effect of valence on learning rate was found (see Supplemental Information Note 3)
Point 3.7
The discussion (line 649) foregrounds the finding of greater "magnitude-only" weights with greater "general factor" psychopathology scores, concluding it reflects a shift toward simplifying heuristics. However, the picture might not be so straightforward because "habit" weights, which also reflect a simplifying heuristic, correlated negatively with the psychopathology scores.
Thanks. In contrast the detrimental effects of “MO”, “habit” is actually beneficial for the task. Please refer to Point 1.12.
Point 3.8
The discussion section contains some pejorative-sounding comments about Gagne et al. 2020 that lack clear justification. Line 611 says that the study "did not attempt to connect the decision process to anxiety and depression traits." Given that linking model-derived learning rate estimates to psychopathology scores was a major topic of the study, this broad statement seems incorrect. If the intent is to describe a more specific step that was not undertaken in that paper, please clarify. Likewise, I don't understand the justification for the statement on line 615 that the model from that paper "is not understandable" - please use more precise and neutral language to describe the model's perceived shortcomings.
Sorry for the confusion. We have removed all abovementioned pejorative-sounding comments.
Point 3.9
4. Minor suggestions:
- Line 114 says people with psychiatric illness "are known to have shrunk cognitive resources" - this phrasing comes across as somewhat loaded.
Thanks. We have removed this argument.
- Line 225, I don't think the reference to "hot hand bias" is correct. I understand hot hand bias to mean overestimating the probability of success after past successes. That's not the same thing as habitual repetition of previous responses, which is what's being discussed here.
Response: Thanks for mentioning this. We have removed all discussions about “hot hand bias”.
- There may be some notational inconsistency if alpha_pi on line 248 and alpha_HA on line 253 are referring to the same thing.
Thanks! Fixed!
- Check the notation on line 285 - there may be some interchanging of decimals and commas.
Thanks! Fixed!
Also, would the interpretation in terms of risk seeking and risk aversion be different for rewarding versus aversive outcomes?
Thanks for asking. If we understand it correctly, risk seeking and risk aversion mechanisms are only present in the RS models, which show clearly worse fitting performance. We thus decide not to overly interpret the fitted parameters in the RS models.
- Line 501, "HA and PAT groups" looks like a typo.
- In Figure 5, better graphical labeling of the panels and axes would be helpful.
Response: Thanks! Fixed!
REFERENCES
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based influences on humans' choices and striatal prediction errors. Neuron, 69(6), 1204-1215.
Gagne, C., Zika, O., Dayan, P., & Bishop, S. J. (2020). Impaired adaptation of learning to contingency volatility in internalizing psychopathology. Elife, 9.
Gershman, S. J. (2020). Origin of perseveration in the trade-off between reward and complexity. Cognition, 204, 104394.
Gershman, S. J., Horvitz, E. J., & Tenenbaum, J. B. (2015). Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science, 349(6245), 273-278.
Von Neumann, J., & Morgenstern, O. (1947). Theory of games and economic behavior, 2nd rev.
-
-
www.biorxiv.org www.biorxiv.org
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public Review):
Summary:
In this study, Maestri et al. use an integrative framework to study the evolutionary history of coronaviruses. They find that coronaviruses arose recently rather than having undergone ancient codivergences with their mammalian hosts. Furthermore, recent host switching has occurred extensively, but typically between closely related species. Humans have acted as an intermediate host, especially between bats and other mammal species.
Strengths:
The study draws on a range of data sources to reconstruct the history of virus-host codivergence and host switching. The analyses include various tests of robustness and evaluations through simulation.
Weaknesses:
The analyses are limited to a single genetic marker (RdRp) from coronaviruses, but using other sections of the genome might lead to different conclusions. The genetic marker also lacks resolution for recent divergences, which precludes the detailed examination of recent host switches. Careful and detailed reconstruction of the timescale would be helpful for clarifying the evolutionary history of coronaviruses alongside their hosts.
The use of a single short genetic marker (the RdRp palmprint region) from coronaviruses is indeed a limitation. However, this marker is the one that is currently used for routinely delimiting operational taxonomic units in RNA viruses and reconstructing their evolutionary history (Edgar et al. 2022, see also the Serratus project; https://serratus.io/); therefore, we took the conscious decision early on to rely on this expertise. Unfortunately, this marker cannot provide robust timescale reconstructions for coronavirus evolution (previous estimates of coronavirus origin range from around 10 thousand years ago to 293 million years ago depending on modeling assumptions). Only future genomic work across Coronaviridae that will characterize multiple genetic regions with different evolutionary rates will allow us to precisely elucidate the timescale of the evolutionary history of coronaviruses alongside their hosts. In the meantime, we show here that, while the RdRp palmprint region cannot by itself resolve the precise timescale of coronavirus evolution, it strongly suggests, when used along with cophylogenetic approaches, a recent evolutionary origin in bats.
We now further discuss these issues and the perspectives offered by future genomic work on lines 462-485.
Reviewer #2 (Public Review):
Summary:
In their study titled "Recent evolutionary origin and localized diversity hotspots of mammalian coronaviruses," authors Benoît Perez-Lamarque, Renan Maestri, Anna Zhukova, and Hélène Morlon investigate the complex evolutionary history of coronaviruses, particularly those affecting mammals, including humans. The study focuses on unraveling the evolutionary trajectory of these viruses, which have shown a high propensity for causing pandemics, as evidenced by the SARS-CoV2 outbreak.
The research addresses a significant gap in our understanding of the evolutionary dynamics of coronaviruses, particularly their history, patterns of host-to-host transmission, and geographical spread. These aspects are important for predicting and managing future pandemic scenarios.
Historically, studies have employed cophylogenetic tests to explore virus-host relationships within the Coronaviridae family, often suggesting a long history of virus-host codiversification spanning millions of years. However, the team led by Perez-Lamarque proposes a novel phylogenetic framework that contrasts this traditional view. Their approach, which involves adapting gene tree-species tree reconciliation, is designed to robustly test the validity of two competing scenarios: an ancient origination and codiversification versus a more recent emergence and diversification through host switching.
Upon applying this innovative framework to the study of coronaviruses and their mammalian hosts, the authors' findings challenge the prevailing notion of a deep evolutionary history. Instead, their results strongly support a scenario where coronaviruses have a more recent origin, likely in bat populations, followed by diversification predominantly through hostswitching events. This diversification, interestingly, seems to occur preferentially within mammalian orders.
A critical aspect of their findings is the identification of hotspots of coronavirus diversity, particularly in East Asia and Europe. These regions align with the proposed scenario of a relatively recent origin and subsequent localized host-switching events. The study also highlights the rarity of spillovers from bats to other species, yet underscores the relatively higher likelihood of such spillovers occurring towards humans, suggesting a significant role for humans as an intermediate host in the evolutionary journey of these viruses.
The research also points out the high rates of host-switching within mammalian orders, including between humans, domesticated animals, and non-flying wild mammals.
In conclusion, the study by Perez-Lamarque and colleagues presents an important quantitative advance in our understanding of the evolutionary history of mammalian coronaviruses. It suggests that the long-held belief in extensive virus-host codiversification may have been substantially overestimated, paving the way for a reevaluation of how we understand, predict, and potentially control the spread of these viruses.
Strengths:
The study is conceptually robust, and its conclusions are convincing.
Weaknesses:
Despite the availability of a dated host tree the authors were only able to use the "undated" model in ALE, with the dated method (which only allows time-consistent transfers) failing on their dataset (possibly due to dataset size?). Further exploration of the question would be potentially valuable.
Our intuition is that ALE in its “dated” version does not necessarily fail on our dataset due to its size: ALE runs, but it provides unrealistic parameter estimates and is not able to output possible reconciliations, as mentioned in our Material and Methods section. We think this issue is mostly due to the fact that there is no pattern of codiversification: the coronavirus and mammal trees are so distinct that finding a reconciliation scenario between these trees with time-consistent switches is very difficult and ALE fails at estimating an amalgamated likelihood for such an unlikely scenario. We now ran the dated version of ALE independently on the smaller alpha and betacoronaviruses datasets. It still fails on the betacoronaviruses dataset. On the alphacoronaviruses dataset, it does output significant reconciliations, however these reconciliations have a majority of events of transfers and losses, confirming that codiversification is unlikely in this clade.
Reviewer #3 (Public Review):
Summary:
This work uses tools and concepts from co-phylogenetic analyses to reconstruct the evolutionary and diversification history of coronaviruses in mammals. It concludes that crossspecies transmissions from bats to humans are a relatively common event (compared to bats to other species). Across all mammals, the diversification history of coronaviruses suggests that there is potential for further evolutionary diversification.
Strengths:
The article uses an interesting approach based on jointly looking at the extant network of coronaviruses-mammals interactions, and the phylogenetic history of both these organisms. The authors do an impressive job of explaining the challenges of reconstructing evolutionary dynamics for RNA viruses, and this helps readers appraise the relevance of their approach.
Weaknesses:
I remain unconvinced by the argument that sampling does not introduce substantial biases in the analyses. As the authors highlight, incomplete knowledge of the extant interactions would lead to a biased reconstruction of the diversification history. In a recent paper (Poisot et al. 2023, Patterns), we look at sampling biases in the virome of mammals and suggest that is a fairly prominent issue, that is furthermore structured by taxonomy, space, and phylogenetic position. Case in point, even for betacoronaviruses, there have been many newly confirmed hosts in recent years. For organisms that have received less intense scrutiny, I think a thorough discussion of potential gaps in data would be required (see for example Cohen et al. 2022, Nat. Comms).
I was also surprised to see little discussion of the differences between alpha and beta coronaviruses - there is evidence that they may differ in their cross-species transmission (see Caraballo et al. 2022 Micr. Spectr.), which could call into question the relevance of treating all coronaviruses as a single, homogeneous group.
Some of the discussions in this paper also echo previous work by e.g. Geoghegan et al. (see 2017, PLOS Pathogens), which I was surprised to not see discussed, as it is a much earlier investigation of the relative frequencies of co-divergence and host switches for different viral families, with a deep discussion of how this may structure future evolutionary dynamics.
We totally agree that sampling biases in the virome of mammals is a prominent issue, which is why we conducted a series of sensitivity analyses to test their effect on our main conclusions. We thoroughly tested the effect of (i) the unequal sampling effort across mammalian species that have been screened and (ii) the unequal screening of mammalian species across the mammalian tree of life by subsampling the data to correct for the unequal sampling effort (see Supporting Information Text). In both cases, we still reported low support for a scenario of codiversification, the origin in bats in East Asia, the preferential host switches within mammalian orders, and the rare spillovers from bats to humans. The robustness of our findings to sampling biases may be explained by the fact that the cophylogenetic approach we used (ALE) explicitly accounts for undersampling by assuming that all host switches involve unsampled intermediate hosts. To address the reviewer's comment, we now better underline the importance of sampling biases in our main text (see Discussion, lines 487-494) with supporting references (note that we did not find the Cohen et al. Nature Comm reference). We also better highlight our sensitivity analyses by moving them from the Supporting Information Text to the main text.
We agree that distinguishing between alpha and beta coronaviruses provides useful additional insights. We have run separate cophylogenetic analyses for these two sub-clades and now report the results of these additional analyses in the revised manuscript, and put them in context with the existing literature about the two sub-clades.
We were not aware of the work of Geoghegan et al. (see 2017, PLOS Pathogens), thank you for providing this reference that is now cited.
Reviewer #1 (Recommendations For The Authors):
(1) Overall I found this paper to be quite difficult to follow. The text needs clearer structure, which can be helped by writing in shorter paragraphs and adding section headings. For example, there are some very long paragraphs starting on L83, L176, L215, L511, and L598.
We have now added section headings and divided these paragraphs into smaller ones.
(2) It would be helpful to define some of the key terminology relating to the evolutionary interactions between the viruses and their hosts. Some of the terms that are typically used in the context include "coevolution", "cospeciation", "codivergence", and "codiversification". These have different meanings and need to be used carefully. The paper mostly deals with "codivergence" between coronaviruses and their host species.
We now provide a list of definitions in Box S1. These definitions are as in our recent article clarifying the differences between these patterns/processes (Perez-Lamarque & Morlon 2024).
Specific comments
L83-L105: This paragraph can be written more concisely.
We prefer to keep this paragraph like this as it contains key explanations that are necessary for understanding our approach and results.
Figure 1: The timescales of the trees are rather confusing. The different scales are indicated by the gray shading but this is easy to overlook. Maybe stretching or compressing the trees horizontally would help to emphasise the different timescales.
Done.
Figure 2: Note that the maximum clade credibility tree is a specific tree sampled from the posterior distribution - it is not a consensus tree. In the figure caption, the meaning of "location" is unclear.
We have removed the word “consensus”, thank you for noting this. We have replaced “location” by “branching order”.
L461: How was the model chosen, and why were different models used in the BEAST and PhyloBayes analyses?
We did our PhyloBayes analyses first and used the LG model following methodology outlined in previous studies using ALE (e.g. Groussin et al. 2017; Dorrell et al. 2021). Unfortunately, the LG model is not available in the default version of BEAST2 so we had to use a different model (the WAG model). We have now run BEAST2 with the LG model (thanks to the BEAST_CLASSIC package) and we obtained very similar results (see Figure below showing the BEAST consensus trees obtained with the WAG or LG models – they only slightly differ by the branching of the u7351 OTU). We have now added this information in the Methods section.
Author response image 1.
L477: It is not clear to me how the PhyloBayes and BEAST analyses differ. Please expand the explanation of why PhyloBayes was used here.
We have now clarified this (lines 594-597).
L568: Why not test explicitly for recombination?
We did test for the occurrence of recombination using several approaches, including
OpenRDP (https://github.com/PoonLab/OpenRDP), our own custom code, and Gubbins (Croucher et al. 2015). These tests were however inconclusive, indicating either the absence or presence of recombination, thus suggesting that the palmprint region is too short to infer anything about recombination. We thus do not exclude the possibility that recombination occurred, and test the robustness of our results to recombination by running our analyses on different sub-parts of the palmprint region. We have clarified this in our Material & Methods.
L618: "DNA sequences" -> "RNA sequences"
Done.
The paper contains numerous minor grammatical errors and would benefit from careful proofreading and editing. Please check the use of plurals and apostrophes. Some of the errors are listed below:
L49: "As several" -> "As with several"
Done.
L178: "reconciliates" -> "reconciles"?
Done.
L199: "extent" -> "extant"
Done.
L289: This sentence needs rephrasing to avoid a triple negative ("cannot ... reject ... not present")
Done.
L469: "temporary" -> "temporal"
Done.
L470: "neglectable" -> "negligible"
Done.
L577: "not only relying" -> "not relying only"
Done.
Reviewer #2 (Recommendations For The Authors):
The study is generally well-constructed and its results are convincing. However, considering the availability of a dated host tree, conducting a dated reconciliation analysis could be beneficial. Creating a smaller sub-dataset and performing a dated reconciliation analysis would likely be a valuable addition to the research.
We have now run the dated version of ALE on both the alpha and betacoronaviruses subclades. ALE dated still does not output reconciliations on the betacoronaviruses dataset, but it does on the smaller alphacoronaviruses dataset. We found significant reconciliations, indicating that mammal-alphacoronavirus associations are not random with respect to phylogeny, but the reconciliations involved more host switch and loss events (38 switches + 29 losses) than cospeciation events (65), indicating cophylogenetic signal in the absence of phylogenetic congruence (Perez-Lamarque & Morlon 2024). We now present the results on lines 264-282.
Reviewer #3 (Recommendations For The Authors):
I think the results are written in a very speculative way, with many sentence fragments that should really be part of the discussion.
We have carefully checked our Results section and rephrased or removed formulation that may have been perceived as speculative.
There are a lot of considerations in this manuscript about spread and future pandemics, but I think this is very far from the topic of this paper. When we quantified the coevolutionary risk of bats-betacovs in a recent paper (Forero et al. 2024, Virus Evol.), we only briefly touched upon this discussion because we compared our outputs with a measure of human population density. I don't think the manuscript needs to talk about epidemiology at all, and it would probably be more useful as a purely evo-bio piece.
We think that it is useful to discuss the potential implications of our results for future pandemics, even though we agree that this discussion is rather speculative. We have removed the mention of predictions in the Abstract and have softened our wording in the Discussion.
References:
Croucher, N.J., Page, A.J., Connor, T.R., Delaney, A.J., Keane, J.A., Bentley, S.D., et al. (2015). Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res., 43, e15.
Dorrell, R.G., Villain, A., Perez-Lamarque, B., Audren de Kerdrel, G., McCallum, G., Watson, A.K., et al. (2021). Phylogenomic fingerprinting of tempo and functions of horizontal gene transfer within ochrophytes. Proc. Natl. Acad. Sci., 118, e2009974118.
Edgar, R.C. et al. (2022). Petabase-scale sequence alignment catalyses viral discovery. Nature 602, 142–147.
Groussin, M., Mazel, F., Sanders, J.G., Smillie, C.S., Lavergne, S., Thuiller, W., et al. (2017).
Unraveling the processes shaping mammalian gut microbiomes over evolutionary time. Nat. Commun., 8, 14319.
Perez-Lamarque, B. & Morlon, H. (2024). Distinguishing cophylogenetic signal from phylogenetic congruence clarifies the interplay between evolutionary history and species interactions. Syst. Biol.
-
eLife assessment
Maestri et al report the absence of phylogenetic evidence supporting codiversification of mammalian coronaviruses and their hosts, leading to the important conclusion that the evolutionary history of the virus and its hosts are decoupled through frequent host switches. The evidence for frequent host switching, derived from state-of-the-art probabilistic modeling of co-evolution, is convincing. The study adds a new perspective to the ongoing debate over the timescale of coronavirus evolution.
-
Reviewer #1 (Public Review):
Summary:
In this study, Maestri et al. use an integrative framework to study the evolutionary history of coronaviruses. They find that coronaviruses arose recently rather than having undergone ancient codivergences with their mammalian hosts. Furthermore, recent host switching has occurred extensively, but typically between closely related species. Humans have acted as an intermediate host, especially between bats and other mammal species.
Strengths:
The study draws on a range of data sources to reconstruct the history of virus-host codivergence and host switching. The analyses include various tests of robustness and evaluations through simulation.
Weaknesses:
The analyses are limited to a single genetic marker (RdRp) from coronaviruses, but using other sections of the genome might lead to different conclusions. The genetic marker also lacks resolution for recent divergences, which precludes the detailed examination of recent host switches. Careful and detailed reconstruction of the timescale would be helpful for clarifying the evolutionary history of coronaviruses alongside their hosts.
-
Reviewer #2 (Public Review):
Summary:
In their study titled "Recent evolutionary origin and localized diversity hotspots of mammalian coronaviruses," authors Benoît Perez-Lamarque, Renan Maestri, Anna Zhukova, and Hélène Morlon investigate the complex evolutionary history of coronaviruses, particularly those affecting mammals, including humans. The study focuses on unraveling the evolutionary trajectory of these viruses, which have shown a high propensity for causing pandemics, as evidenced by the SARS-CoV2 outbreak.<br /> The research addresses a significant gap in our understanding of the evolutionary dynamics of coronaviruses, particularly their history, patterns of host-to-host transmission, and geographical spread. These aspects are important for predicting and managing future pandemic scenarios.
Historically, studies have employed cophylogenetic tests to explore virus-host relationships within the Coronaviridae family, often suggesting a long history of virus-host codiversification spanning millions of years. However, the team led by Perez-Lamarque proposes a novel phylogenetic framework that contrasts this traditional view. Their approach, which involves adapting gene tree-species tree reconciliation, is designed to robustly test the validity of two competing scenarios: an ancient origination and codiversification versus a more recent emergence and diversification through host switching.
Upon applying this innovative framework to the study of coronaviruses and their mammalian hosts, the authors' findings challenge the prevailing notion of a deep evolutionary history. Instead, their results strongly support a scenario where coronaviruses have a more recent origin, likely in bat populations, followed by diversification predominantly through host-switching events. This diversification, interestingly, seems to occur preferentially within mammalian orders.
A critical aspect of their findings is the identification of hotspots of coronavirus diversity, particularly in East Asia and Europe. These regions align with the proposed scenario of a relatively recent origin and subsequent localized host-switching events. The study also highlights the rarity of spillovers from bats to other species, yet underscores the relatively higher likelihood of such spillovers occurring towards humans, suggesting a significant role for humans as an intermediate host in the evolutionary journey of these viruses.
The research also points out the high rates of host-switching within mammalian orders, including between humans, domesticated animals, and non-flying wild mammals.<br /> In conclusion, the study by Perez-Lamarque and colleagues presents an important quantitative advance in our understanding of the evolutionary history of mammalian coronaviruses. It suggests that the long-held belief in extensive virus-host codiversification may have been substantially overestimated, paving the way for a reevaluation of how we understand, predict, and potentially control the spread of these viruses.
Strengths:
The study is conceptually robust, and its conclusions are convincing.
Weaknesses:
The authors could only use the "undated" model in ALE, with the dated method (which only allows time-consistent transfers) failing on their dataset. The authors did attempt to address this issue in the revision, albeit with limited success.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
In this useful study, the authors tested the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes using modeling and behavioral experiments, claiming that bumblebees rely most on ground-views for homing. However, due to a lack of analysis of the bees' behavior during training and a lack of information as to how the homing behavior of bees develops over time, the evidence supporting their claims is currently incomplete. Moreover, there was concern that the experimental environment was not representative of natural scenes, thus limiting the findings of the study.
-
Reviewer #1 (Public Review):
Summary:
In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.
Strengths:
The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented.
Weaknesses:
Views of animals are from a rather small catchment area.
Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).
The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.
-
Reviewer #2 (Public Review):
Summary:
In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.
During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.
In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.<br /> The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions:<br /> line 51: "Snapshot models perform best with bird's eye views";<br /> line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it.";<br /> line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views."
Strengths:
The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.
Weaknesses:
Modelling:<br /> Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.
Behavioural analysis:<br /> The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.
Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17.
Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.
General:
The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).<br /> In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.
-
Author response:
Reviewer 1 (Public Review):
“Summary:
In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.
Strengths:
The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented.
Weaknesses:
Views of animals are from a rather small catchment area.
Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).
The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.”
Thank you for your thorough evaluation of our study. We aimed to investigate local homing behaviour on a small scale, which is ecologically relevant given that the entrance of bumblebee nests is often inconspicuously hidden within the vegetation. This requires bees to locate their nest entrance using views within a confined area. While many studies have focused on larger scales using radar tracking (e.g. Capaldi et al. 2000; Osborne et al. 2013; Woodgate et al. 2016), there is limited understanding of the mechanisms behind local homing on a smaller scale, especially in dense environments.
We appreciate your suggestion to include the study by Murray and Zeil (2017) in our discussion. Their research explored the catchment areas of image difference functions on a larger spatial scale with a cubic volume of 5m x 5m x 5m. Aligned with their results, we found that image difference functions pointed towards the location of the objects surrounding the nest when the images were taken above the objects. However, within the clutter, i.e. the dense set of objects surrounding the nest, the model did not perform well in pinpointing the nest position.
We agree with your comment about the term "clutter". Therefore, we will refer to our landmark arrangement as a "dense environment" instead. Uniformly distributed objects do indeed occur in nature, as seen in grasslands, flower meadows, or forests populated with similar plants.
Reviewer 2 (Public Review):
Summary:
In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.
During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.
In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.
The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions:
line 51: "Snapshot models perform best with bird's eye views"; line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it."; line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views." Strengths:
The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.
The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.
Weaknesses:
Modelling:
Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.
When we started modelling the bees’ homing based on image-matching, we included the arena wall. However, the model simulations pointed only coarsely towards the clutter but not toward the nest position. We hypothesised that the arena wall and object location created ambiguity. Doussot et al. (2020) showed that such a model can yield two different homing locations when distant and local cues are independently moved. Therefore, we reduced the complexity of the environment by concentrating on the visual features, which were moved between training and testing. (Neither the camera nor the wall were moved between training and test). We acknowledge that this information should have been provided to substantiate our reasoning. As such, we will include model results with the arena wall in the revised paper.
As we wanted to investigate if bees would use ground views or bird’s eye views to home in a dense environment, we think the catchment volumes would provide qualitatively similar, though quantitatively more detailed information as catchment slices. Our approach of catchment slices is sufficient to predict whether ground or bird' s-eye views perform better in leading to the nest, and we will, therefore, not include further computations of catchment volumes.
Behavioural analysis:
The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.
Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17.
Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.
A prerequisite for studying the learning flight in a given environment is showing that the bees manage to return to their home. Here, our primary goal was to demonstrate this within a dense environment. While we understand that a detailed analysis of the learning and return flights would be valuable, we feel this is outside the scope of this particular study.
Multi-snapshot models have been repeatedly shown to be sufficient to explain the homing behaviour in natural as well as artificial environments. A model can not only be used to replicate but also to predict a given outcome and shape the design of experiments. Here, we used the models to shape the experimental design, as it does not require the entire history of the bee's trajectory to be tested and provides interesting insight into homing in diverse environments.
Our current knowledge of learning flights did not permit these investigations of bee training. Firstly, our setup does not allow us to record each inbound and outbound flight of the bumblebees during training. Doing so would require blocking the entire colony for extended time periods, potentially impairing the motivation of the bees to forage or the survival and development of the colony. Secondly, the exact locations where bees learn or if and whether they continuously learn by weighting the visual experience based on their positions and orientations is not always clear. It makes it difficult to categorise these flights accurately in learning and return flights. Additionally, homing models remain elusive on the learning mechanisms at play during the learning flights. Therefore, we believe that continuous effort must be made to understand bees' learning and homing ability. We felt it was necessary first to establish that bees could navigate back to the nest in a dense, cluttered environment. With this understanding, we are currently conducting a detailed study of the bees' learning flights in various dense environments and provide these results in a separate article.
While we acknowledge that the bees had ample opportunities to learn the location of the nest entrance, we believe that their behaviour of entering the dense environment at a very low altitude cannot be solely explained by extended experience. It is possible that the bees could have also learned to enter at the edge of the objects or above the objects before descending within the clutter.
General:
The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).
In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.
We respectfully disagree with the evaluation that our study does not provide new insights due to the controlled lab conditions. Both field and lab research are absolutely necessary and should feed each other. Dismissing the value of controlled lab experiments would overlook the contributions of previous lab-based research, which has significantly advanced our understanding of animal behaviour. It is only possible to precisely define the visual test environments under laboratory conditions and to identify the role of these components for the behaviour through targeted variation of individual components of the environment. These results should guide field-based experiments for validation.
Our lab settings are a kind of abstraction of natural situations focusing on those aspects that are at the centre of the research question. Our approach here was that bumblebees have to find their inconspicuous nest hole in nature, which is difficult to find in often highly dense environments, and ultimately on a spatial scale in the metre range. We first wanted to find out if bumblebees can find their nest hole under the particularly challenging condition that all objects surrounding the nest hole are the same. This was not yet clear. Uniformly distributed objects may, however, also occur in nature, as seen with visually inconspicuous nest entrances of bumblebees in grass meadows, flower meadows, or forests with similar plants. We agree that the term "clutter" is not well-defined in the literature and will refer to our environment as a "dense environment."
Despite the lack of a distant visual panorama, or also UV light, wind, or other confounding factor inherent to field work, the bees successfully located the nest position even when we shifted the dense environment within the flight arena. We used rotational-image difference functions based on snapshots taken around the nest position to predict the bees' behaviour, as this is one of the most widely accepted and computationally most parsimonious
mechanisms for homing. This approach also proved effective in our more restricted conditions, where the bees still managed to pinpoint their home.
-
-
www.biorxiv.org www.biorxiv.org
-
Reviewer #1 (Public Review):
Summary:
Das and Menon describe an analysis of a large open-source iEEG dataset (UPENN-RAM). From encoding and recall phases of memory tasks, they analyzed power and phase-transfer entropy as a measure of directed information flow in regions across a hypothesized tripartite network system. The anterior insula (AI) was found to have heightened high gamma power during encoding and retrieval, which corresponded to suppression of high gamma power in the posterior cingulate cortex (PCC) during encoding but not recall. In contrast, directed information flow from (but not to) AI to mPFC/PCC and dorsal posterior parietal/middle frontal cortex is high during both time periods when PTE is analyzed with broadband but not narrowband activity. They claim that these findings significantly advance an understanding of how network communication facilitates cognitive operations such as control over memory and that the AI of the salience network (SN) is responsible for governing the switch between the frontoparietal network (FPN) and default-mode network (DMN) when shifting between externally- and internally-driven processing.
I find this question interesting and important and agree with the authors that iEEG presents a unique opportunity to investigate the temporal dynamics within network nodes. However, I am not convinced that their claims are supported by the results currently presented. In particular, the fact that network-level communication is not modulated significantly compared to rest and does not relate to behavior suggests that PTE analyses may not be tapping into task-relevant communication. Moreover, dissociation of network effects - present during both encoding and recall - from local power suppression effects - present only during encoding - suggests that these sets of results may index separate and not unitary task processes.
Strengths:
- The authors present results from an impressively sized iEEG sample. For reader context, this type of invasive human data is difficult and time-consuming to collect and many similar studies in high-level journals include 5-20 participants, typically not all of whom have electrodes in all regions of interest. It is excellent that they have been able to leverage open-source data in this way.
- Preprocessing of iEEG data also seems sensible and appropriate based on field standards.
- The authors tackle the replication issues inherent in much of the literature by replicating findings across task contexts, demonstrating that the principles of network communication evidenced by their results generalize in multiple task memory contexts. Again, the number of iEEG patients who have multiple tasks' worth of data is impressive.
Weaknesses:
• The motivation for investigating the tripartite network during memory tasks is not currently well-elaborated. Though the authors mention, for example, that "the formation of episodic memories relies on the intricate interplay between large-scale brain networks (p. 4)", there are no citations provided for this statement, and the reader is unable to evaluate whether the nodes and networks evidenced to support these processes are the same as networks measured here.
• In addition, though the tripartite network has been proposed to support cognitive control processes, and the neural basis of cognitive control is the framed focus of this work, the authors do not demonstrate that they have measured cognitive control in addition to simple memory encoding and retrieval processes. Tasks that have investigated cognitive control over memory (such as those cited on p. 13 - Badre et al., 2005; Badre & Wagner, 2007; Wagner et al., 2001; Wagner et al., 2005) generally do not simply include encoding, delay, and recall (as the tasks used here), but tend to include some manipulation that requires participants to engage control processes over memory retrieval, such as task rules governing what choice should be made at recall (e.g., from Badre et al., 2005 Fig. 1: congruency of match, associative strength, number of choices, semantic similarity). Moreover, though there are task-responsive signatures in the nodes of the tripartite networks, concluding that cognitive control is present because cognitive control networks are active would be a reverse inference.
• It is currently unclear if the directed information flow from AI to DMN and FPN nodes truly arises from task-related processes such as cognitive control or if it is a function of static brain network characteristics constrained by anatomy (such as white matter connection patterns, etc.). This is a concern because the authors did not find that influences of AI on DMN or FPN are increased relative to a resting baseline (collected during the task) or that directed information flow differs in successful compared to unsuccessful retrieval. I doubt that this AI influence is 1) supporting a switch between the DMN and FPN via the SN or 2) relevant for behavior if it doesn't differ from baseline-active task or across accuracy conditions. An additional comparison that may help investigate whether this is reflective of static connectivity characteristics would be a baseline comparison during non-task rest or sleep periods.
• Related to the above concern, it is also questionable how directed information flow from AI facilitates switching between FPN and DMN during both encoding and recall if high gamma activity does not significantly differ in AI versus PCC or mPFC during recall as it does during encoding. It seems erroneous to conclude that the network-level communication is happening or happening with the same effect during both task time points when these effects are decoupled in such a way from the power findings.
• Missing information about the methods used for time-frequency conversion for power calculation and the power normalization/baseline-correction procedure bars a thorough evaluation of power calculation methods and results.
If revisions to the manuscript can address concerns about directed information flow possibly being due to anatomical constraints - such as by indicating that directed information flow is not present during non-task rest or sleep - this work may convey important information about the structure and order of communication between these networks during attention to tasks in general. However, the ability of the findings to address cognitive control-specific communication and the nature of neurophysiological mechanisms of this communication - as opposed to the temporal order and structure of recruited networks - may be limited.
Because phase-transfer entropy is presented as a "causal" analysis in this investigation (PTE), I also believe it is important to highlight for readers recent discussions surrounding the description of "causal mechanisms" in neuroscience (see "Confusion about causation" section from Ross and Bassett, 2024, Nature Neuroscience). A large proportion of neuroscientists (admittedly, myself included) use "causal" only to refer to a mechanism whose modulation or removal (with direct manipulation, such as by lesion or stimulation) is known to change or control a given outcome (such as a successful behavior). As Ross and Bassett highlight, it is debatable whether such mechanistic causality is captured by Granger "causality" (a.k.a. Granger prediction) or the parametric PTE, and the imprecise use of "causation" may be confusing. The authors could consider amending language regarding this analysis if they are concerned about bridging these definitions of causality across a wide audience.
-
eLife assessment
In this manuscript, the authors present valuable findings on the apparent role of a salience-network anterior insula node in directing fronto-parietal and default-mode network activity within a tripartite network during control of memory, drawn from an impressive invasive human neurophysiological dataset. While we commend the use of a large intracranial EEG dataset to approach this question, the study at present is incomplete in its methodologies, analysis, and interpretation to support the authors' central claims. The manuscript could be improved by addressing the concerns described.
-
Reviewer #2 (Public Review):
In this study, the authors leverage a large public dataset of intracranial EEG (the University of Pennsylvania RAM repository) to examine electrophysiologic network dynamics involving the participation of salience, frontoparietal, and default mode networks in the completion of several episodic memory tasks. They do this through a focus on the anterior insula (AI; salience network), which they hypothesize may help switch engagement between the DMN and FPN in concert with task demands. By analyzing high-gamma spectral power and phase transfer entropy (PTE; a putative measure of information "flow"), they show that the AI shows higher directed PTE towards nodes of both the DMN and FPN, during encoding and recall, across multiple tasks. They further demonstrate that high-gamma power in the PCC/precuneus is decreased relative to the AI during memory encoding. They interpret these results as evidence of "triple-network" control processes in memory tasks, governed by a key role of the AI.
I commend the authors on leveraging this large public dataset to help contextualize network models of brain function with electrophysiological mechanisms - a key problem in much of the fMRI literature. I also appreciate that the authors emphasized replicability across multiple memory tasks, in an effort to demonstrate conserved or fundamental mechanisms that support a diversity of cognitive processes. However, I believe that their strong claims regarding causal influences within circumscribed brain networks cannot be supported by the evidence as presented. In my efforts to clearly communicate these inadequacies, I will suggest several potential analyses for the authors to consider that might better link the data to their central hypotheses.
(1) As a general principle, the effects that the authors show - both in regards to their high-gamma power analysis and PTE analysis - do not offer sufficient specificity for a reader to understand whether these are general effects that may be repeated throughout the brain, or whether they reflect unique activity to the networks/regions that are laid out in the Introduction's hypothesis. This lack of specificity manifests in several ways, and is best communicated through examples of control analyses.
First, the PTE analysis is focused solely on the AI's interactions with nodes of the DMN and FPN; while it makes sense to focus on this putative "switch" region, the fact that the authors report significant PTE from the AI to nodes of both networks, in encoding and retrieval, across all tasks and (crucially) also at baseline, raises questions about the meaningfulness of this statistic. One way to address this concern would be to select a control region that would be expected to have little/no directed causal influence on these networks and repeat the analysis. Alternatively (or additionally), the authors could examine the time course of PTE as it evolves throughout an encoding/retrieval interval, and relate that to the timing of behavioral events or changes in high-gamma power. This would directly address an important idea raised in their own Discussion, "the AI is well-positioned to dynamically engage and disengage with other brain areas."
Second, the authors state that high-gamma suppression in the PCC/precuneus relative to the AI is an anatomically specific signature that is not present in the FPN. This claim does not seem to be supported by their own evidence as presented in the Supplemental Data (Figures S2 and S3), which to my eye show clear evidence of relative suppression in the MFG and dPPC (e.g. S2a and S3a, most notably) which are notated as "significant" with green bars. I appreciate that the magnitude of this effect may be greater in the PCC/precuneus, but if this is the claim it should be supported by appropriate statistics and interpretation.
(2) I commend the authors on emphasizing replicability, but I found their Bayes Factor (BF) analysis to be difficult to interpret and qualitatively inconsistent with the results that they show. For example, the authors state that BF analysis demonstrates "high replicability" of the gamma suppression effect in Figure 3a with that of 3c and 3d. While it does appear that significant effects exist across all three tasks, the temporal structure of high gamma signals appears markedly different between the two in ways that may be biologically meaningful. Moreover, it appears that the BF analysis did not support replicability between VFR and CATVFR, which is very surprising; these are essentially the same tasks (merely differing in the presence of word categories) and would be expected to have the highest degree of concordance, not the lowest. I would suggest the authors try to analytically or conceptually reconcile this surprising finding.
To aid in interpretability, it would be extremely helpful for the authors to assess across-task similarity in high-gamma power on a within-subject basis, which they are well-powered to do. For example, could they report the correlation coefficient between HGP timecourses in paired-associates versus free-recall tasks, to better establish whether these effects are consistent on a within-subject basis? This idea could similarly be extended to the PTE analysis. Across-subject correlations would also be a welcome analysis that may provide readers with better-contextualized effect sizes than the output of a Bayes Factor analysis.
-
-
www.biorxiv.org www.biorxiv.org
-
Reviewer #1 (Public Review):
Summary:
The investigation delves into allosteric modulation within the glycosylated SARS-CoV-2 spike protein, focusing on the fatty acid binding site. This study uncovers intricate networks connecting the fatty acid site to crucial functional regions, potentially paving the way for developing innovative therapeutic strategies.
Strengths:
This article's key strength lies in its rigorous use of dynamic nonequilibrium molecular dynamics (D-NEMD) simulations. This approach provides a dynamic perspective on how the fatty acid binding site influences various functional regions of the spike. A comprehensive understanding of these interactions is crucial in deciphering the virus's behavior and identifying potential targets for therapeutic intervention.
Weaknesses:
The presented evidence is compelling but could be better if this study is supported with sequence analysis to facilitate a complete view of the allosteric networks. The thorough analysis of the simulation results is partially aligned with the discussion because observed in the replicates and the monomers an asymmetry in the perturbations generated by D-NEMD, even when we're using 210 nonequilibrium MD of 10 ns. While the authors claim that the strategy used in this article has been previously validated, the complexity of the spike and the interactions analyzed have required a robust statistical analysis, which is not shown quantitatively. The investigation examines the allosteric modulation within the glycosylated SARS-CoV-2 spike protein, emphasizing the significance of the fatty acid binding site in influencing the structural dynamics and communication pathways essential for viral function, potentially facilitating the development of novel therapeutic strategies. The presented evidence is compelling but needs to be supported by sequence analysis, which will facilitate understanding of the scientific community.
Minor considerations:
Figure S3 shows a discrepancy in the presentation of residue values S325 in the plots of Chains A, B, and C. While chain A shows a value near 0.1, chains B and C plots do not have any value.
Please explain why the plots of figures S6, S7, and S8 show significant changes in several regions, such as RBM and Furin Site. Can these changes be explained?
The flow of the allosteric interaction is complex to visualize just by looking at structures. Could you please include a diagram showing the flow of allosteric interactions (in a sequence diagram or using the structure of the protein)? Or could you include a vector showing how the perturbation done in the FA Active site takes contact with other relevant regions of the Spike protein?
-
Reviewer #3 (Public Review):
Summary:
In a previous study, the authors analyzed the dynamics of the SARS-CoV2 spike protein through lengthy MD simulations and an out-of-equilibrium sampling scheme. They identified an allosteric interaction network linking a lipid-binding site to other structurally important regions of the spike. However, this study was conducted without considering the impact of glycans. It is now known that glycans play a crucial role in modulating spike dynamics. This new manuscript investigates how the presence of glycans affects the allosteric network connecting the lipid binding site to the rest of the spike. The authors conducted atomistic equilibrium and out-of-equilibrium MD simulations and found that while the presence of glycans influences the structural responses, it does not fundamentally alter the connectivity between the fatty acid site and the rest of the spike.
Strengths:
The manuscript's findings are based on an impressive amount of sampling. The methods and results are clearly outlined, and the analysis is conducted meticulously.
Weaknesses:
The study does not clearly show any new findings. The authors themselves acknowledge that the manuscript mainly presents negative results-indicating that glycans do not significantly impact the allosteric network previously reported in other publications. All the results in the paper are based on a single methodology, and additional independent approaches would be needed to confirm the robustness of these findings. Allosteric networks arise from subtle correlations in protein structural dynamics, and it's uncertain whether the results discussed in this manuscript stem exclusively from the chosen force field and other modeling and analysis decisions, or if they indeed reflect something real.
-
eLife assessment
This manuscript focuses on understanding if and how the glycosylation of SARS-CoV2 spike protein affects a putative allosteric network of interactions controlled by the binding of a fatty acid. The main conclusion is that glycans do not significantly affect the network of allosteric interactions. This useful information - albeit mainly consisting of negative results - is based on solid evidence. It will be of interest to scientists focusing on SARS CoV2 protein structure and dynamics.
-
Reviewer #2 (Public Review):
This is a nice paper illustrating the use of equilibrium/non-equilibrium MD simulations to explore allosteric communication in the Spike protein. The results are described in detail and suggest a complex network of signal transmission patterns. The topic is not completely novel as it has been studied before by the same authors and the impact of glycosylation is moderated and localized at the furin site, so not many new conclusions emerge here. It is suggested that mutations are commonly found in the communication pathway which is interesting, but the authors fail to provide evidence that this is related to a positive selection and not simply to a random effect related to mutations at points that are not crucial for stability or function. One interesting point is the connection of the FA site with an additional site binding heme group. It will be interesting to see reversibility, i.e. removal of the ligand at this site is producing perturbation at the FA site?, does it produce other effects suggesting a cascade of allosteric effects? Finally, the paper lacks details to help reproducibility, in particular, I do not see details on D-NEMD calculations. One interesting point is the connection of the FA site with an additional site binding heme group.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This is an important study, as PIM1/2 control of protein synthesis in differentiated cells has implications beyond T cells. The evidence is convincing in that it makes extensive use of the mouse knockout model and validation in mouse T cells with inhibitors. A rescue experiment in mouse KO T cells would be even stronger than the inhibitor studies to validate the KO phenotype and the evidence would be truly impressive if the results from the rescue experiment support the working model. Extending the observations to human T cells would also be a step towards translation and would further increase the potential impact of the work.
-
Reviewer #1 (Public Review):
Summary and Strengths:
The study focuses on PIM1 and 2 in CD8 T cell activation and differentiation. These two serine/threonine kinases belong to a large network of Serine/Threonine kinases that acts following engagement of the TCR and of cytokine receptors and phosphorylates proteins that control transcriptional, translational and metabolic programs that result in effector and memory T cell differentiation. The expression of PIM1 and PIM2 is induced by the T-cell receptor and several cytokine receptors. The present study capitalized on high-resolution quantitative analysis of the proteomes and transcriptomes of Pim1/Pim2-deficient CD8 T cells to decipher how the PIM1/2 kinases control TCR-driven activation and IL-2/IL-15-driven proliferation, and differentiation into effector T cells.<br /> Quantitative mass spectrometry-based proteomics analysis of naïve OT1 CD8 T cell stimulated with their cognate peptide showed that the PIM1 protein was induced within 3 hours of TCR engagement and its expression was sustained at least up to 24 hours. The kinetics of PIM2 expression was protracted as compared to that of PIM1. Such TCR-dependent expression of PIM1/2 correlated with the analysis of both Pim1 and Pim2 mRNA. In contrast, Pim3 mRNA was only expressed at very low levels and the PIM3 protein was not detected by mass spectrometry. Therefore, PIM1 and 2 are the major PIM kinases in recently activated T cells. Pim1/Pim2 double knockout (Pim dKO) mice were generated on a B6 background and found to express a lower number of splenocytes. No difference in TCR/CD28-driven proliferation was observed between WT and Pim dKO T cells over 3 days in culture. Quantitative proteomics of >7000 proteins further revealed no substantial quantitative or qualitative differences in protein content or proteome composition. Therefore, other signaling pathways can compensate for the lack of PIM kinases downstream of TCR activation.
Considering that PIM1 and PIM2 kinase expression is regulated by IL-2 and IL-15, antigen-primed CD8 T cells were expanded in IL-15 to generate memory phenotype CD8 T cells or expanded in IL-2 to generate effector cytotoxic T lymphocytes (CTL). Analysis of the survival, proliferation, proteome, and transcriptome of Pim dKO CD8 T cells kept for 6 days in IL-15 showed that PIM1 and PIM2 are dispensable to drive the IL-15-mediated metabolic or differentiation programs of antigen-primed CD8 T cells. Moreover, Pim1/Pim2-deficiency had no impact on the ability of IL-2 to maintain CD8 T cell viability and proliferation. However, WT CTL downregulated the expression of CD62L whereas the Pim dKO CTL sustained higher CD62L expression. Pim dKO CTL was also smaller and less granular than WT CTL. Comparison of the proteome of day 6 IL-2 cultured WT and Pim dKO CTL showed that the latter expressed lower levels of the glucose transporters, SLC2A1 and SLC2A3, of a number of proteins involved in fatty acid and cholesterol biosynthesis, and CTL effector proteins such as granzymes, perforin, IFNg, and TNFa. Parallel transcriptomics analysis showed that the reduced expression of perforin and some granzymes correlated with a decrease in their mRNA whereas the decreased protein levels of granzymes B and A, and the glucose transporters SLC2A1 and SLC2A3 did not correspond with decreased mRNA expression. Therefore, PIM kinases are likely required for IL-2 to maximally control protein synthesis in CD8 CTL. Along that line, the translational repressor PDCD4 was increased in Pim dKO CTL and pan-PIM kinase inhibitors caused a reduction in protein synthesis rates in IL-2-expanded CTL. Finally, the differences between Pim dKO and WT CTL in terms of CD62L expression resulted in Pim dKO CTL but not WT CTL retained the capacity to home to secondary lymphoid organs. In conclusion, this thorough and solid study showed that the PIM1/2 kinases shape the effector CD8 T cell proteomes rather than transcriptomes and are important mediators of IL2-signalling and CD8 T cell trafficking.
Weaknesses:
None identified by this reviewer.
-
Reviewer #2 (Public Review):
Summary:
Using a suite of techniques (e.g., RNA seq, proteomics, and functional experiments ex vivo) this paper extensively focuses on the role of PIM1/2 kinases during CD8 T-cell activation and cytokine-driven (i.e., IL-2 or IL-15) differentiation. The authors' key finding is that PIM1/2 enhances protein synthesis in response to IL-2 stimulation, but not IL-15, in CD8+ T cells. Loss of PIM1/2 made T cells less 'effector-like', with lower granzyme and cytokine production, and a surface profile that maintained homing towards secondary lymphoid tissue. The cytokines the authors focus on are IL-15 and Il-2, which drive naïve CD8 T cells towards memory or effector states, respectively. Although PIM1/2 are upregulated in response to T-cell activation and cytokine stimulation (e.g., IL-15, and to a greater extent, IL-2), using T cells isolated from a global mouse genetic knockout background of PIM1/2, the authors find that PIM1/2 did not significantly influence T-cell activation, proliferation, or expression of anything in the proteome under anti-CD3/CD28 driven activation with/without cytokine (i.e., IL-15) stimulation ex vivo. This is perhaps somewhat surprising given PIM1/2 is upregulated, albeit to a small degree, in response to IL-15, and yet PIM1/2 did not seem to influence CD8+ T cell differentiation towards a memory state. Even more surprising is that IL-15 was previously shown to influence the metabolic programming of intestinal intraepithelial lymphocytes, suggesting cell-type specific effects from PIM kinases. What the authors went on to show, however, is that PIM1/2 KO altered CD8 T cell proteomes in response to IL-2. Using proteomics, they saw increased expression of homing receptors (i.e., L-selectin, CCR7), but reduced expression of metabolism-related proteins (e.g., GLUT1/3 & cholesterol biosynthesis) and effector-function related proteins (e.g., IFNy and granzymes). Rather neatly, by performing both RNA-seq and proteomics on the same IL-2 stimulated WT vs. PIM1/2 KO cells, the authors found that changes at the proteome level were not corroborated by differences in RNA uncovering that PIM1/2 predominantly influence protein synthesis/translation. Effectively, PIM1/2 knockout reduced the differentiation of CD8+ T cells towards an effector state. In vivo adoptive transfer experiments showed that PIM1/2KO cells homed better to secondary lymphoid tissue, presumably owing to their heightened L-selectin expression (although this was not directly examined).
Strengths:
Overall, I think the paper is scientifically good, and I have no major qualms with the paper. The paper as it stands is solid, and while the experimental aim of this paper was quite specific/niche, it is overall a nice addition to our understanding of how serine/threonine kinases impact T cell state, tissue homing, and functionality. Of note, they hint towards a more general finding that kinases may have distinct behaviour in different T-cell subtypes/states. I particularly liked their use of matched RNA-seq and proteomics to first suggest that PIM1/2 kinases may predominantly influence translation (then going on to verify this via their protein translation experiment - although I must add this was only done using PIM kinase inhibitors, not the PIM1/2KO cells). I also liked that they used small molecule inhibitors to acutely reduce PIM1/2 activity, which corroborated some of their mouse knockout findings - this experiment helps resolve any findings resulting from potential adaptation issues from the PIM1/2 global knockout in mice but also gives it a more translational link given the potential use of PIM kinase inhibitors in the clinic. The proteomics and RNA seq dataset may be of general use to the community, particularly for analysis of IL-15 or IL-2 stimulated CD8+ T cells.
Weaknesses:
It would be good to perform some experiments in human T cells too, given the ease of e.g., the small molecule inhibitor experiment. Would also be good for the authors to include a few experiments where PIM1/2 have been transduced back into the PIM1/2 KO T cells, to see if this reverts any differences observed in response to IL-2 - although the reviewer notes that the timeline for altering primary T cells via lentivirus/CRISPR may be on the cusp of being practical such that functional experiments can be performed on day 6 after first stimulating T cells. Other experiments could also look at how PIM1/2 KO influences the differentiation of T cell populations/states during ex vivo stimulation of PBMCs or in vivo infection models using (high-dimensional) flow cytometry (rather than using bulk proteomics/RNA seq which only provide an overview of all cell combined). Alongside this, performing a PCA of bulk RNA seq/proteomes or Untreated vs. IL-2 vs. IL-15 of WT and PIM1/2 knockout T cells would help cement their argument in the discussion about PIM1/2 knockout cells being distinct from a memory phenotype.
-
-
www.biorxiv.org www.biorxiv.org
-
Author response:
The following is the authors’ response to the original reviews.
Editors’ recommendations for the authors
The reviewers recommend the following:
(a) Digging deeper into the discussion of the density-dependent dispersal.
(b) Clarifying the microfluidic setup.
(c) Clarifying the description and interpretation of the transcriptomic evidence.
(d) Toning down carbon cycle connections (some reviewers felt the evidence did not fully support the claims).
We would like to thank the editors for their thoughtful evaluation of our manuscript and their clear suggestions. We have revised the manuscript in the light of these comments, as we outline below and address in detail in the point-by-point response to the reviewers’ comments that follows.
(a) We have expanded the discussion of density-dependent dispersal and revised Figure 2C to improve clarity.
(b) We have also added further information concerning the microfluidic setup in the results section and provide an illustration of the setup in a new figure panel, Figure 1A.
(c) Addressing the reviewers’ comments on the transcriptomic analysis, we have added more information in the description and interpretation of the results.
(d) We have rephrased the text describing the role of degradation-dispersal cycles for carbon cycling to highlight it as the motivation of this study and emphasize the link to literature on foraging, without creating expectations of direct measurements of global carbon cycling.
Public Reviews:
Reviewer #1 (Public Review):
[...]
Weaknesses:
Much of the genetic analysis, as it stands, is quite speculative and descriptive. I found myself confused about many of the genes (e.g., quorum sensing) that pop up enriched during dispersal quite in contrast to my expectations. While the authors do mention some of this in the text as worth following up on, I think the analysis as it stands adds little insight into the behaviors studied. However, I acknowledge that it might have the potential to generate hypotheses and thus aid future studies. Further, I found the connections to the carbon cycle and marine environments in the abstract weak --- the microfluidics setup by the authors is nice, but it provides limited insight into naturalistic environments where the spatial distribution and dimensionality of resources are expected to be qualitatively different.
We thank the reviewer for their suggestions to improve our manuscript. We agree that the original manuscript would have benefitted from more detailed interpretation of the observed changes in gene expression. We have revised the manuscript to elaborate on the interpretation of the changes in expression of quorum sensing genes (see response to reviewer 1, comment 3), motility genes (see response to reviewer 1, comment 6), alginate lyase genes (see response to reviewer 1, comment 7 and reviewer 2, comment 2), and ribosomal and transporter genes (see response to reviewer 2, comment 2).
In general, we think that the gene expression study not only supports the phenotypic observations that we made in the microfluidic device, such as the increased swimming motility when exposed to digested alginate medium, but also adds further insights. Our reasoning for studying the transcriptomes in well mixed-batch cultures was the inability to study gene expression dynamics to support the phenotypic observations about differential motility and chemotaxis in our microfluidics setup. The transcriptomic data clearly show that even in well-mixed environments, growth on digested alginate instead of alginate is sufficient to increase the expression of motility and chemotaxis genes. In addition, the finding that expression of alginate lyases and metabolic genes is increased during growth on digested alginate was revealed through the analysis of transcriptomes, something which would not have been possible in the microfluidic setup. We agree with the reviewer that our analyses implicate further, perhaps unexpected, mechanisms like quorum sensing in the cellular response to breakdown products, and that this represents an interesting avenue for further studies.
Finally, we also agree with the reviewer that it would be good to be more explicit in the text that our microfluidic system cannot fully capture the complex dynamics of natural environments. Our approach does, however, allow the characterization of cellular behaviors at spatial and temporal scales that are relevant to the interactions of bacteria, and thus provides a better understanding of colonization and dispersal of marine bacteria in a manner that is not possible through in situ experiments. We have edited our manuscript to highlight this and modified our statements regarding carbon cycling towards emphasizing the role degradation-dispersal cycles in remineralization of polysaccharides (see response to reviewer 1, comment 2).
Reviewer #2 (Public Review):
[...]
Weaknesses:
The explanation of the microfluidics measurements is somewhat confusing but I think this could be easily remedied. The quantitative interpretation of the dispersal data could also be improved and I'm not clear if the data support the claim made.
We thank the reviewer for their comments and helpful suggestions. We have revised the manuscript with these suggestions in mind and believe that the manuscript is improved by a more detailed explanation of the microfluidic setup. We have added more information in the text (detailed in response to reviewer 2, comments 1 and 2) and have added a depiction of the microfluidic setup (Fig. 1A). We have also modified the presentation and discussion of the dispersal data (Fig. 2C), as described in detail below in response to reviewer 2, comment 4, and argue that they clearly show density-dependent dispersal. We believe that this modification of how the results are presented provides a more convincing case for our main conclusion, namely that the presence of degradation products controls bacterial dispersal in a density-dependent manner.
Reviewer #3 (Public Review):
[...]
Weaknesses:
I find this paper very descriptive and speculative. The results of the genetic analyses are quite counterintuitive; therefore, I understand the difficulty of connecting them to the observations coming from experiments in the microfluidic device. However, they could be better placed in the literature of foraging - dispersal cycles, beyond bacteria. In addition, the interpretation of the results is sometimes confusing.
We thank the reviewer for their suggestions to improve the manuscript. We have edited the manuscript to interpret the results of this study more clearly, in particular with regard to the fact that breakdown products of alginate cause cell dispersal (see response to reviewer 2, comment 1), gene expression changes of ribosomal proteins and transporters (see response to reviewer 2, comment 2), as well as genes relating to alginate catabolism (see response to reviewer 2, comment 3).
To provide more context for the interpretation of our results we now also embed our findings in more detail in the previous work on foraging strategies and dispersal tradeoffs.
Recommendations For The Authors:
Reviewer #1 (Recommendations For The Authors):
(1) The authors should clarify in more detail what they mean by density dependence in Figure 2. Usually density dependence refers to a per capita dependence, but here it seems that the per capita rate of dispersal might be roughly independent of density (Figure 2c; if you double the number of cells it doubles the number of cells leaving). Rather it seems the dispersal is such that the density of remaining cells falls below a threshold (~300 cells).
We thank the reviewer for raising this important point. To analyze the data more explicitly in terms of per capita dependence and so make the density dependence in the dispersal from the microfluidic chambers more clear, we have modified Figure 2C and edited the text.
In the modified Figure 2C, we computed the fraction of dispersed cells for each chamber (i.e the change in cell number divided by the cell number at the time of the nutrient switch). This quantity directly reveals the per-capita dependence, as mentioned by reviewer 1, and is now represented on the y-axis of Figure 2C instead of the absolute change in cell number.
These data demonstrate that the fraction of dispersed cells increases with increasing numbers of cells present in the chamber at the time of switching, with more highly populated chambers showing a higher fraction of dispersed cells. These findings indicate that there is a strong density dependence in the dispersal process.
As pointed out by reviewer 1, another interesting aspect of the data is the transition at low cell number. The fraction of dispersed cells is negative in the case of the chamber with approximately 70 cells, consistent with no dispersal at this low density, and a moderate density increase as a function of continued growth.
In addition to the new analysis presented in Figure 2C, we have modified the paragraph that discusses this result as follows (line 208):
“We indeed found that the nutrient switch caused a few or no cells to disperse from small cell groups (Fig. 2B), whereas a large fraction of cells from large cell groups dispersed (Fig. 2C). In fact, the e fraction of cells that dispersed upon imposition of the nutrient switch showed a strong positive relationship with the number of cells present, meaning that cells in chambers with many cells were more likely to disperse than cells in chambers with fewer cells (Fig. 2C).”
(2) The authors should tone down their claims about the carbon cycle in the abstract. I do not believe the results as they stand could be used to understand degradation-dispersal cycles in marine environments relevant to the carbon cycle, since these behaviors have been studied in microfluidic environments which in my understanding are quite different. As such, statements such as "degradation-dispersal cycles are an integral part in the global carbon cycle, we know little about how cells alternate between degradation and motility" and "Overall, our findings reveal the cellular mechanisms underlying bacterial degradation-dispersal cycles that drive remineralization in natural environments" are overstated in the abstract.
We appreciate the reviewer’s comments regarding the connections of our work with the carbon cycle. We have now rephrased these statements in our manuscript to describe a potential connection between our work and the marine carbon cycle. The colonization of polysaccharides particles by bacteria and subsequent degradation has been widely acknowledged to play a significant role in controlling the carbon flow in marine ecosystems. (Fenchel, 2002; Preheim et al., 2011; Yawata et al., 2014, 2020). We still refer to carbon flow in the revised manuscript, though cautiously, as microbial remineralization of biomass, which is recognized as an important factor in the marine biological carbon pump (e.g., (Chisholm, 2000; Jiao et al., 2024). As stated in the previous version of the manuscript, the main motivation of our work was to study the growth behaviors of marine heterotrophic bacteria during polysaccharide degradation, especially to understand when bacteria depart already colonized and degraded particles and find novel patches to grow and degrade, a process that is poorly understood. Therefore, it is conceivable that degradation-dispersal cycles do play a role in the flow of carbon in marine ecosystems. However, we acknowledge that the carbon cycle is influenced by a multitude of biological and chemical processes, and the bacterial degradation-dispersal cycle might not be the sole mechanism at play.
We also appreciate the reviewer’s comments highlighting that the complexity of natural environments is not fully captured in our microfluidics system. However, our microfluidics setup does allow us to quantify responses and behaviors of microbial groups at high spatial and temporal resolution, especially in the context of environmental fluctuations. Microbes in nature interact at small spatial scales and have to respond to changes in the environment, and the microfluidics setup enables the quantification of these responses. Moreover, dispersal of the bacterium V. cyclitrophicus that we use in our study, has been previously observed even during growth on particulate alginate (Alcolombri et al., 2021), but the cues and regulation controlling dispersal behaviors have been unclear. Microfluidic experiments have now allowed us to study this process in a highly quantitative manner, and align well with observations from experiments from more nature-like settings. These quantitative experiments on bacterial strains isolated from marine particles are expected to constrain quantitative models of carbon degradation in the ocean (Nguyen et al., 2022).
We have now adjusted our statements throughout our manuscript to reflect the knowledge gaps in understanding the triggers of degradation-dispersal cycles and their links with carbon flow in marine ecosystems. The revised manuscript, especially, contains the following statements (line 47 and line 60):
“Even though many studies indicate that these degradation-dispersal cycles contribute to the carbon flow in marine systems, we know little about how cells alternate between polysaccharide degradation and motility, and which environmental factors trigger this behavioral switch.”
“Overall, our findings reveal cellular mechanisms that might also underlie bacterial degradation-dispersal cycles, which influence the remineralization of biomass in marine environments.”
(3) The authors should clarify why they think quorum-sensing genes are increased in expression on digested alginate. The authors currently mention that QS could be used to trigger dispersal, but given the timescales of dispersal in Figure 2 (~half an hour), I find it hard to believe that these genes are expressed and have the suggested effect on those timescales. As such I would have expected the other way round - for QS genes to be expressed highly during alginate growth, so that density could be sensed and responded to. Please clarify.
We have now clarified this point in the revised manuscript. While the triggering of dispersal by quorum-sensing genes may indeed appear counterintuitive, and the response is rapid (we see dispersal of cells within 30-40 minutes), both observations are in line with previous studies in another model organism Vibrio cholerae. The dispersal time is similar to the dispersal time of V. cholerae cells from biofilms, as described by Singh and colleagues, (Figure 1E of Ref. Singh et al., 2017). In that case, induction of the quorum sensing dispersal regulator HapR was observed during biofilm dispersal within one hour after switch of condition (Fig. 2, middle panel of Ref. Singh et al., 2017). Even though the specific quorum sensing signaling molecules are probably different in our strain (there is no annotated homolog of the hapR gene in V. cyclitrophicus), we observed that the full set of quorum sensing genes was enriched in cells growing on digested alginate (as reported in line 314 and Fig. 4A).
We have added this information in the manuscript (line 317):
“The set of quorum sensing genes was also positively enriched in cells growing on digested alginate (Fig. 4A and S4F, Table S13). This role in dispersal is in agreement with a previous study that showed induction of the quorum sensing master regulator in V. cholerae cells during dispersal from biofilms on a similar time scale as here (less than an hour) [28].”
Reviewer #2 (Recommendations For The Authors):
(1) Around line 144 - I don't really understand how you flow alginate through the microfluidic platform. It seems if the particles are transiently going through the microfluidic chamber then the flow rate and hence residence time of the alginate particles will matter a lot by controlling the time the cells have to colonize and excrete enzymes for alginate breakdown. Or perhaps the alginate is not particulate but is instead a large but soluble polymer? I think maybe a schematic of the microfluidic device would help -- there is an implicit assumption that we are familiar with the Dal Co et al device, but I don't recall its details and maybe a graphic added to Figure 1 would help.
a. In reviewing the Dal Co paper I see that cells are trapped and the medium flows through channels and the plane where the cells are held. I am still a little confused about the size of the polymeric alginate -- large scale (>1um) particles or very small polymers?
We have now provided a detailed description of our microfluidic experimental system. At the start of the experiments, cells are in fact not trapped within the microfluidic device, but grow and can move freely within a chamber designed with dimensions (sub-micron heights) so that growth occurs only as a monolayer. Cells were exposed to nutrients, either alginate or alginate digestion products, both in soluble form (not particles). These compounds were flowed into the device through a main channel, but entered the flowfree growth chambers by diffusion. To make these aspects of our experiments clearer, we have added further information on this in the Materials & Methods section (line 556), added this information in the abstract (line 51), and in the results (line123).
To make our microfluidic setup clearer, we have followed this advice and added a schematic as Figure 1A and have added more information on the setup to the main text (line 153):
“In brief, the microfluidic chips are made of an inert polymer (polydimethylsiloxane) bound to a glass coverslip. The PDMS layer contains flow channels through which the culture medium is pumped continuously. Each channel is connected to several growth chambers that are laterally positioned. The dimensions of these growth chambers (height: 0.85 µm, length: 60 µm, width: 90-120 µm) allow cells to freely move and grow as monolayers. The culture medium, containing either alginate or digested alginate in their soluble form, is constantly pumped through the flow channel and enters the growth chambers primarily through diffusion [15,16,4,17,8]. Therefore, the number of cells and their positioning within microfluidic chambers is determined by the cellular growth rate as well as by cell movement4. This setup combined with time-lapse microscopy allowed us to follow the development of cell communities over time.”
(2) What makes this confusing is the difference between Figure 1C and Figure S2A -- the authors state that the difference in Figure 1C is due to dispersal, but is there flow through the microfluidic device? So what role does that flow through the device have in dispersal? Is the adhesion of the cell groups driven at all by a physical interaction with high molecular weight polymers in the microfluidic devices or is this purely a biological effect? Could this also be explained by different real concentrations of nutrients in the two cases?
We realize from this comment that the role of flow of the medium in the microfluidic setup was not clearly addressed in our manuscript. In fact, cells were not exposed to flow, and nutrients were provided to the growth chambers by diffusion. We have added a clearer explanation of this point on line 158:
“The culture medium, containing either alginate or digested alginate in their soluble form, is constantly pumped through the flow channel and enters the growth chambers primarily through diffusion [15,16,4,17,8]. Therefore, the number of cells and their positioning within microfluidic chambers is determined by the cellular growth rate as well as by cell movement4.“
One purely physical effect that we anticipate is that a high viscosity of the medium could immobilize cells. To address this point, we measured the viscosity of both alginate and digested alginate and conclude that the increase in viscosity is not strong enough to immobilize cells. We added a statement in the text (line 170)
“To test the role of increased viscosity of polymeric alginate in causing the increased aggregation of cells, we measured the viscosity of 0.1% (w/v) alginate or digested alginate dissolved in TR media. For alginate, the viscosity was 1.03±0.01 mPa·s (mean and standard deviation of three technical replicates) whereas the viscosity of digested alginate in TR media was found to be 0.74±0.01 mPa·s. Both these values are relatively close to the viscosity of water at this temperature (0.89 mPa·s18) and, while they may affect swimming behavior [19], they are insufficient to physically restrain cell movement [20].”
as well as a section in the Materials and Methods (line 594):
“Viscosity of the alginate and digested alginate solution
We measured the viscosity of alginate solutions using shear rheology measurements. We use a 40 mm cone-plate geometry (4° cone) in a Netzsch Kinexus Pro+ rheometer. 1200 uL of sample was placed on the bottom plate, the gap was set at 150 um and the sample trimmed. We used a solvent trap to avoid sample evaporation during measurement. The temperature was set to 25°C using a Peltier element. We measure the dynamic viscosity over a range of shear rates = 0.1 – 100 s-1. We report the viscosity of each solution as the average viscosity measured over the shear rates 10 – 100 s-1, where the shear-dependence of the viscosity was low.
We measured the viscosity of 0.1% (w/V) alginate dissolved in TR media, which was 1.03 +/- 0.01 mPa·s (reporting the mean and standard deviation of three technical replicates.). The viscosity of 0.1% digested alginate in TR media was found to be 0.74+/-0.01 mPa·s. This means that the viscosity of alginate in our microfluidic experiments is 36% higher than of digested alginate, but the viscosities are close to those expected of water (0.89 mPa·s at 25 degree Celsius according to Berstad and colleagues [18]).”
While our microfluidic setup allows us to track the position and movement of cells in a spatially structured setting, these observations do not allow us to distinguish directly whether the differences in dispersal are a result of purely physical effects of polymers on cells or are a result of them triggering a biological response in cells that causes them to become sessile. It is known that bacterial appendages like pili interact with polysaccharide residues (Li et al., 2003). Therefore, it is quite plausible that cross-linking by polysaccharides can contribute growth behaviors on alginate. However, our analysis of gene expression demonstrates that flagellum-driven motility is decreased in the presence of alginate compared to digested alginate, alongside other major changes in gene expression. In addition, our measures of dispersal show that dispersal of cells when exposed to digested alginate is density dependent. Both observations suggest that the patterns in dispersal are governed by decision-making processes by cells resulting in changes in cell motility, rather than being a product of purely physical interactions with the polymer.
The finding that viscosities of both alginate and digested alginate are similar to that of water, suggests that diffusion of nutrients in the growth chambers should be similar. Therefore, we think that the differences in real concentrations of nutrients is likely not contributing to the observed differences in behavior.
(3) Why is Figure S1 arbitrary units? Does this have to do with the calibration of LC-MS? It would be better, it seems, to know the concentrations in real units of the monomer at least.
We agree with the reviewer that it would have been better to have absolute concentrations for these compounds. However, to calibrate the mass spectrometer signals (ion counts) to absolute concentrations for the different alginate compounds, we would need an analytical standard of known concentration. We are not aware of such a standard and thus report only relative concentrations. We agree that the y-axis label of Figure S1 should not contain ‘arbitrary’ units, as it shows a ratio (of measurements in the same arbitrary units). We have edited the labels of Figure S1 accordingly and the figure legend in line 26 of the Supplemental Material (“Relative concentrations…”).
(4) Line 188 - density-dependent dispersal. The claim here is that "cells in chambers with many cells were more likely to disperse than cells in chambers with less cells." (my emphasis). Looking at the data in Figure 2C it appears that about 40% of the cells disperse irrespective of the density, before the switch to digested alginate. So it would seem that there is not a higher likelihood of dispersal at higher cell densities. For the very highest cell density, it does appear that this fraction is larger, but I'd be concerned about making this claim from what I understand to be a single experiment. To support the claim made should the authors plot Change in Cell number/Starting Cell number on the y-axis of Fig. 2C to show that the fraction is increasing? It would seem some additional data at higher starting cell densities would help support this claim more strongly.
We thank the reviewer for this comment, which is in line with a remark made by reviewer 1 in their comment 1. In response to these two comments (and as described above), we have edited Figure 2C and now have plotted the change in cell number relative to starting cell number at the y axis to directly show the density dependence. We observe a positive (approximately linear) relationship between the fraction of dispersed cells with the number of cells present in the chamber at the time of switching. This indicates that there is a density dependence in the dispersal process, with highly populated chambers showing a higher fraction of dispersed cells.
In addition to the change in Figure 2C, we have modified the paragraph around line 208: “We indeed found that the nutrient switch caused a few or no cells to disperse from small cell groups (Fig. 2B), whereas a large fraction of cells from large cell groups dispersed (Fig. 2C). In fact, the e fraction of cells that dispersed upon imposition of the nutrient switch showed a strong positive relationship with the number of cells present, meaning that cells in chambers with many cells were more likely to disperse than cells in chambers with fewer cells (Fig. 2C).”
The highest cell number at the start of the switch that we include is about 800 cells. The maximum number of cells that can fit into a chamber are ca. 1000 cells. Thus, 800 resident cells are close to the maximal density.
(5) A comment -- I find the result of significant chemotaxis towards alginate but not the monomers of alginate to be quite surprising. The ecological relevance of this (line 219) seems like an important result that is worth expanding on a bit at least in the discussion. For now, my question is whether the authors know of any mechanism by which chemotaxis receptors could respond to alginate but not the monomer. How can a receptor distinguish between the two?
We agree that this result is surprising, given that oligomers can be more easily transported into the periplasm where sensing takes place, and they also provide an easier accessible nutrient source. Indeed, in case of the insoluble polymer chitin it has been shown that chemotaxis towards chitin is mediated by chitin oligomers (Bassler et al., 1991), which was suggested as a general motif to locate polysaccharide nutrient sources (Keegstra et al., 2022). However, a recent study has changed this perspective by showing widespread chemotaxis of marine bacteria towards the glucose-based marine polysaccharide laminarin, but not towards laminarin oligomers or glucose (Clerc et al., 2023). Together with our results on chemotaxis towards alginate (but not significantly toward alginate oligomers) this suggests that chemotaxis towards soluble polysaccharides can be mediated by direct sensing of the polysaccharide molecules.
As recommended, we expanded the discussion of the ecological relevance and also added more information on possible mechanisms of selective sensing of alginate and its breakdown products (around line 479).:
“Direct chemotaxis towards polysaccharides may facilitate the search for new polysaccharide sources after dispersal. We found that the presence of degradation products not only induces cell dispersal but also increases the expression of chemotaxis genes. Interestingly, we found that V. cyclitrophicus ZF270 cells show chemotaxis towards polymeric alginate but not digested alginate. This contrasts with previous findings for bacterial strains degrading the insoluble marine polysaccharide chitin, where chemotaxis was strongest towards chitin oligomers53, suggesting that oligomers may act as an environmental cue for polysaccharide nutrient sources55. However, recent work has shown that certain marine bacteria are attracted to the marine polysaccharide laminarin, and not laminarin oligomers56. Together with our results, this indicates that chemotaxis towards soluble polysaccharides may be mediated by the polysaccharide molecules themselves. The mechanism of this behavior is yet to be identified, but could be mediated by polysaccharide-binding proteins as have been found in Sphingomonas sp. A1 facilitating chemotaxis towards pectin57. Direct polysaccharide sensing adds complexity to chemosensing as polysaccharides cannot freely diffuse into the periplasm, which can lead to a trade-off between chemosensing and uptake58. Furthermore, most polysaccharides are not immediately metabolically accessible as they require degradation. But direct polysaccharide sensing can also provide certain benefits compared to using oligomers as sensory cues. First, it could enable bacterial strains to preferably navigate to polysaccharide nutrients sources that are relatively uncolonized and hence show little degradation activity. Second, strong chemotaxis towards degradation products could hinder a timely dispersal process as the dispersal then requires cells to travel against a strong attractant gradient formed by the degradation products. Overall, this strategy allows cells to alternate between degradation and dispersal to acquire carbon and energy in a heterogeneous world with nutrient hotspots [44,59–61].”
(6) Comment on lines 287-8 -- that the "positive enrichment of the gene set containing bacterial motility proteins matched the increase in motile cells that we observe in Fig 3E." I'm confused about what is meant by the word "matched" here. Is the implication that there is some quantitative correspondence between increased motility in Figure 3 and the change in expression in Figure 4? Or is the statement a qualitative one -- that motility genes are upregulated in the presence of digested alginate? Table S12 didn't help me answer this question.
We thank the reviewer for their helpful comment. Our original statement was a qualitative one - observing that gene expression enrichment in genes associated with bacterial motility aligned with our expectations based on the previous observation of an increase in motile cells. We have now changed the wording to highlight the qualitative nature of this statement (line 315):
“The positive enrichment of the gene set containing bacterial motility proteins aligned with our expectations based on the increase in motile cells that we observed in Figure 3E (Fig. 4A, Table S12).”
(7) Line 326 - what is the explanation for the production of public enzymes in the presence of digest? How does this square with the previous narrative about cells growing on alginate digest expressing motility genes and chemotaxing towards alginate? It seems like the story is a bit tenuous here in the sense that digested alginates stimulate both motility - which is hypothesized to drive the discovery of new alginate particles - and lyase enzymes which are used to degrade alginate. So do the high motility cells that are chemotaxing towards alginate also express lyases en route? I'm of the opinion that constructing narratives like these in the absence of a more quantitative understanding of the colonization and degradation dynamics of alginate particles presents a major challenge and may be asking more of the data than the data can provide.
a. I noted later that this is addressed later around lines 393 in the Discussion section.
Indeed, the notion that the presence of breakdown products triggers motility and also increases the expression of alginate lyases and other metabolic genes for alginate catabolism seems counterintuitive. We have now expanded our discussion of these results to contextualize these findings (around line 443):
"One reason for this observation may be that cells primarily rely on intracellular monosaccharide levels to trigger the upregulation of genes associated with polysaccharide degradation and catabolism, as has previously been observed for E. coli across various carbon sources [50,51]. In fact, the majority of carbon sources are sensed by prokaryotes through one‑component sensors inside the cell50. In the one‑component internal sensing scheme, the enzymes and transporters for the use of various carbon sources are expressed at basal levels, which leads to an increase in pathway intermediates upon nutrient availability. The pathway intermediates are sensed by an internal sensor, usually a transcription factor, and lead to the upregulation of transporter and enzyme expression [50,51]. This results in a positive feedback loop, which enables small changes in substrate abundance to trigger large transcriptional responses [50,52]. Thus, the presence of alginate breakdown products may likely result in increased expression of all components of the alginate degradation pathway, including the expression of degrading enzymes. As the gene expression analysis was performed on well-mixed cultures in culture medium containing alginate breakdown products, we therefore expect a strong stimulation of alginate catabolism. In a natural scenario, where cells disperse from a polysaccharide hotspot before its exhaustion, the expression of alginate catabolism genes may likely decrease again once the local concentration of breakdown products decreases. However, continued production of alginate lyases could also provide an advantage when encountering a new alginate source and continued production of alginate lyases may thus help cells to prepare for likely future environments. Further investigations of bacterial enzyme secretion in changing nutrient environments and at relevant spatial scales are required to improve our understanding of the regulation of enzyme secretion along nutrient gradients."
(8) I like Figure 6, and I think this hypothesis is a good result from this paper, but I think it would be important to emphasize this as a proposal that needs further quantitative analysis to be supported.
We have now edited the manuscript to make this point more clear. While both degradation and dispersal are well-appreciated parts of microbial ecology, the transitions and underlying mechanisms are unclear. We have edited the discussion to improve the clarity (line 419):
“This cycle of biomass degradation and dispersal has long been discussed in the context of foraging e.g., [44,45,13,46,47], but the cellular mechanisms that drive the cell dispersal remain unclear.”
Also, we have updated Figure 6 to indicate more clearly which new findings this work proposes (now bold font) and which previous findings that were made in different bacterial taxa and carbon sources that aligns with our work (now light font). We edited the figure legend accordingly (line 503):
"By integrating our results with previous studies on cooperative growth on the same system, as well as results on dispersal cycles in other systems, we highlight where the specific results of this work add to this framework (bold font)."
Minor comments
(1) Is there any growth on the enzyme used for alginate digestion? E.g. is the enzyme used to digest the alginate at sufficiently high concentrations that cells could utilize it for a carbon/nitrogen source?
We thank the reviewer for raising this point. We added the following paragraph as Supplemental Text to address it (line 179):
“Protein amount of the alginate lyases added to create digested alginate
Based on the following calculation, we conclude that the amount of protein added to the growth medium by the addition of alginate lyases is so small that we consider it negligible. In our experiment we used 1 unit/ml of alginate lyases in a 4.5 ml solution to digest the alginate. As the commercially purchased alginate lyases are 10,000 units/g, our 4.5 ml solution contains 0.45 mg of alginate lyase protein. The digested alginate solution diluted 45x when added to culture medium. This means that we added 0.18 µg alginate lyase protein to 1 ml of culture medium.
As a comparison, for 1ml of alginate medium, 1000µg of alginate is added or for 1 ml of Lysogeny broth (LB) culture medium, 3,500 µg of LB are added. Thus, the amount of alginate lyase protein that we added is ca. 5000 - 20,000 times smaller than the amount of alginate or LB that one would add to support cell growth. Therefore, we expect the growth that the digestion of the added alginate lyases would allow to be negligible.”
(2) The lines in Figure 2B are very hard to see.
We have addressed this comment by using thicker lines in Figure 2B.
(3) The black background and images in Figure 3A and B are hard to see as well.
We have now replaced Figure 3A and B, now using a white background.
(4) Typo at the beginning of line 251?
Unfortunately we failed to find the typo referred to. We are happy to address it if it still exists in the revised manuscript.
Reviewer #3 (Recommendations For The Authors):
(1) I think there is not enough experimental evidence to conclude that the underlying cause of increased motility is the accumulation of digested alginate products. To conclusively show that this is the cause and not just some signal linked to cell density, perhaps the experiment should be repeated with a different carbon source.
We thank the reviewer for their comment, which made us realize that we did not make the nature of the dispersal cue clear. The gene expression data was obtained from batch cultures and measured at the same approximate bacterial densities in batch, which indeed shows that the digested alginate is a sufficient signal for an increase in motility gene expression. This agrees very well with our observation that cells growing on digested alginate in microfluidic chambers have an increased fraction of motile cells in comparison with cells exposed to alginate (Fig 3E). However, we did not mean to suggest that the observed dispersal by bacterial motility is not influenced by cell density, in fact, we see that dispersal (and hence the increase in cell motility) in microfluidic chambers that are switched from polymeric to digested alginate depends on the bacterial density in the chamber, with higher bacterial densities showing increased dispersal. This shows that the presence of alginate oligomers does trigger dispersal through motility, but this signal affects bacterial groups in a cell density dependent manner.
Similar observations have been made in Caulobacter crescentus, which was found to form cell groups on the polymer xylan while cells disperse when the corresponding monomer xylose becomes available (D’Souza et al., 2021). We reference the additional work in lines 179 and 230. Taken together, these observations indicate a more general phenomenon in dispersal from polysaccharide substrates.
(2) About the expression data:
• Ribosomal proteins and ABC transporters are enriched in cells grown on digested alginate and the authors discuss that this explains the difference in max growth rate between alginate and digested alginate. However, in Figure S2E the authors report no statistical difference between growth rates.
We have now edited the manuscript to clarify this point. We found that cells grown on degradation products reached their maximal growth rate around 7.5 hours earlier (Fig. S2D) and showed increased expression of ribosomal biosynthesis and ABC transporters in late-exponential phase (Fig. 4A). We consider this shorter lag time as a sign of a different growth state and therefore a possible reason for the difference in ribosomal protein expression.
As the reviewer correctly points out, the maximum growth rates that were computed from the two growth curves were not significantly different (Fig. S2E). However, for our gene expression analysis, we harvested the transcriptome of cells that reached OD 0.39-0.41 (mid- to late-exponential phase). At this time point, the cell cultures may have differed in their momentary growth rate.
We edited the manuscript to make this clearer (line 287):
“Both observations likely relate to the different growth dynamics of V. cyclitrophicus ZF270 on digested alginate compared to alginate (Fig. S2A), where cells in digested alginate medium reached their maximal growth rate 7.5 hours earlier and thus showed a shorter lag time (Fig. S2D). As a consequence, the growth rate at the time of RNA extraction (mid-to-late exponential phase) may have differed, even though the maximum growth rate of cells grown in alginate medium and digested alginate medium were not found to be significantly different (Fig. S2E).”
• The increased expression of transporters for lyases in cells grown on digested alginate (lines 273-274 and 325-328) is very confusing and the explanation provided in lines 412-420 is not very convincing. My two cents on this: Expression of more enzymes and induction of motility might be a strategy to be prepared for more likely future environments (after dispersal, alginate is the most likely carbon source they will find). This would be in line with observed increased chemotaxis towards the polymer rather than the monomer (Similar to C. elegans).
This comment is in line with reviewer 2, comment 7. In response to these two comments (and as described above), we expanded our discussion of these results to contextualize these findings (around line 443):
“One reason for this observation may be that cells primarily rely on intracellular monosaccharide levels to trigger the upregulation of genes associated with polysaccharide degradation and catabolism, as has previously been observed for E. coli across various carbon sources [50,51]. In fact, the majority of carbon sources are sensed by prokaryotes through one‑component sensors inside the cell [50]. In the one‑component internal sensing scheme, the enzymes and transporters for the use of various carbon sources are expressed at basal levels, which leads to an increase in pathway intermediates upon nutrient availability. The pathway intermediates are sensed by an internal sensor, usually a transcription factor, and lead to the upregulation of transporter and enzyme expression [50,51]. This results in a positive feedback loop, which enables small changes in substrate abundance to trigger large transcriptional responses [50,52]. Thus, the presence of alginate breakdown products may likely result in increased expression of all components of the alginate degradation pathway, including the expression of degrading enzymes. As the gene expression analysis was performed on well-mixed cultures in culture medium containing alginate breakdown products, we therefore expect a strong stimulation of alginate catabolism. In a natural scenario, where cells disperse from a polysaccharide hotspot before its exhaustion, the expression of alginate catabolism genes may likely decrease again once the local concentration of breakdown products decreases. However, continued production of alginate lyases could also provide an advantage when encountering a new alginate source and continued production of alginate lyases may thus help cells to prepare for likely future environments. Further investigations of bacterial enzyme secretion in changing nutrient environments and at relevant spatial scales are required to improve our understanding of the regulation of enzyme secretion along nutrient gradients.”
Additionally, we agree with the intriguing comment that continued expression of alginate lyases may also prepare cells for likely future environments. Further studies that aim to answer whether marine bacteria are primed by their growth on one carbon source towards faster re-initiation of degradation on a new particle will be an interesting research question. We now address this point in our manuscript (line 458):
“However, continued production of alginate lyases could also provide an advantage when encountering a new alginate source and continued production of alginate lyases may thus help cells to prepare for likely future environments. Further investigations of bacterial enzyme secretion in changing nutrient environments and at relevant spatial scales are required to improve our understanding of the regulation of enzyme secretion along nutrient gradients.“
(3) The yield reached by Vibrio on alginate is significantly higher than the yield in digested alginate, not similar, as stated in lines 133-134. Only cell counts are similar. Perhaps the author can correct this statement and speculate on the reason leading to this discrepancy: perhaps cells tend to aggregate in alginate despite the fact that these are well-mixed cultures.
We have edited the description of the OD measurements accordingly and agree with the reviewer that aggregation is indeed a possible reason for the discrepancy (line 141):
“We also observed that the optical density at stationary phase was higher when cells were grown on alginate (Fig. S2B and C). However, colony counts did not show a significant difference in cell numbers (Fig. S3), suggesting that the increased optical density may stem from aggregation of cells in the alginate medium, as observed for other Vibrio species [7].”
(4) I suggest toning down the importance of the results presented in this study for understanding global carbon cycling. There is a link but at present it is too much emphasized.
We have edited our statements regarding the carbon cycle. In the revised manuscript we stress the lack of direct quantifications of carbon cycling. . We still refer to carbon flow in the revised manuscript, as we would argue that microbial remineralization of biomass is recognized as an important factor in the marine biological carbon pump (e.g., Chisholm, 2000) and research on marine bacterial foraging investigates how bacterial cells manage to find and utilize this biomass.
Our revised manuscript contains the following modified statements (line 47 and line 60): “Even though many studies indicate that these degradation-dispersal cycles contribute to the carbon flow in marine systems, we know little about how cells alternate between polysaccharide degradation and motility, and which environmental factors trigger this behavioral switch.”
“Overall, our findings reveal cellular mechanisms that might also underlie bacterial degradation-dispersal cycles, which influence the remineralization of biomass in marine environments.”
References
- Alcolombri, U., Peaudecerf, F. J., Fernandez, V. I., Behrendt, L., Lee, K. S., & Stocker, R. (2021). Sinking enhances the degradation of organic particles by marine bacteria. Nature Geoscience, 14(10), 775–780. https://doi.org/10.1038/s41561-021-00817-x
- Bassler, B. L., Gibbons, P. J., Yu, C., & Roseman, S. (1991). Chitin utilization by marine bacteria. Chemotaxis to chitin oligosaccharides by Vibrio furnissii. Journal of Biological Chemistry, 266(36), 24268–24275. https://doi.org/10.1016/S0021-9258(18)54224-1
- Chisholm, S. W. (2000). Stirring times in the Southern Ocean. Nature, 407(6805), 685–686. https://doi.org/10.1038/35037696
- Chubukov, V., Gerosa, L., Kochanowski, K., & Sauer, U. (2014). Coordination of microbial metabolism. Nature Reviews. Microbiology, 12(5), 327–340. https://doi.org/10.1038/nrmicro3238
- Clerc, E. E., Raina, J.-B., Keegstra, J. M., Landry, Z., Pontrelli, S., Alcolombri, U., Lambert, B. S., Anelli, V., Vincent, F., Masdeu-Navarro, M., Sichert, A., De Schaetzen, F., Sauer, U., Simó, R., Hehemann, J.-H., Vardi, A., Seymour, J. R., & Stocker, R. (2023). Strong chemotaxis by marine bacteria towards polysaccharides is enhanced by the abundant organosulfur compound DMSP. Nature Communications, 14(1), 8080. https://doi.org/10.1038/s41467-023-43143z
- Dal Co, A., van Vliet, S., Kiviet, D. J., Schlegel, S., & Ackermann, M. (2020). Shortrange interactions govern the dynamics and functions of microbial communities. Nature Ecology and Evolution, 4(3), 366–375. https://doi.org/10.1038/s41559-019-1080-2
- D’Souza, G., Ebrahimi, A., Stubbusch, A., Daniels, M., Keegstra, J., Stocker, R., Cordero, O., & Ackermann, M. (2023). Cell aggregation is associated with enzyme secretion strategies in marine polysaccharide-degrading bacteria. The ISME Journal. https://doi.org/10.1038/s41396-023-01385-1
- D’Souza, G. G., Povolo, V. R., Keegstra, J. M., Stocker, R., & Ackermann, M. (2021). Nutrient complexity triggers transitions between solitary and colonial growth in bacterial populations. The ISME Journal, 15(9), 2614–2626. https://doi.org/10.1038/s41396-021-00953-7
- D’Souza, G., Schwartzman, J., Keegstra, J., Schreier, J. E., Daniels, M., Cordero, O. X., Stocker, R., & Ackermann, M. (2023). Interspecies interactions determine growth dynamics of biopolymer-degrading populations in microbial communities. Proceedings of the National Academy of Sciences of the United States of America, 120(44), e2305198120. https://doi.org/10.1073/pnas.2305198120
- Fenchel, T. (2002). Microbial Behavior in a Heterogeneous World. Science, 296(5570), 1068–1071. https://doi.org/10.1126/science.1070118
- Jiao, N., Luo, T., Chen, Q., Zhao, Z., Xiao, X., Liu, J., Jian, Z., Xie, S., Thomas, H., Herndl, G. J., Benner, R., Gonsior, M., Chen, F., Cai, W.-J., & Robinson, C. (2024). The microbial carbon pump and climate change. Nature Reviews Microbiology. https://doi.org/10.1038/s41579-024-01018-0
- Keegstra, J. M., Carrara, F., & Stocker, R. (2022). The ecological roles of bacterial chemotaxis. Nature Reviews Microbiology, 20(8), 491–504. https://doi.org/10.1038/s41579-022-00709-w
- Konishi, H., Hio, M., Kobayashi, M., Takase, R., & Hashimoto, W. (2020). Bacterial chemotaxis towards polysaccharide pectin by pectin-binding protein. Scientific Reports, 10(1), 3977. https://doi.org/10.1038/s41598-020-60274-1
- Li, Y., Sun, H., Ma, X., Lu, A., Lux, R., Zusman, D., & Shi, W. (2003). Extracellular polysaccharides mediate pilus retraction during social motility of Myxococcus xanthus. Proceedings of the National Academy of Sciences, 100(9), 5443–5448. https://doi.org/10.1073/pnas.0836639100
- Martínez-Antonio, A., Janga, S. C., Salgado, H., & Collado-Vides, J. (2006). Internal sensing machinery directs the activity of the regulatory network in Escherichia coli. Trends in Microbiology, 14(1), 22–27. https://doi.org/10.1016/j.tim.2005.11.002
- McDougald, D., Rice, S. A., Barraud, N., Steinberg, P. D., & Kjelleberg, S. (2012). Should we stay or should we go: Mechanisms and ecological consequences for biofilm dispersal. Nature Reviews Microbiology, 10(1), 39–50. https://doi.org/10.1038/nrmicro2695
- Nguyen, T. T. H., Zakem, E. J., Ebrahimi, A., Schwartzman, J., Caglar, T., Amarnath, K., Alcolombri, U., Peaudecerf, F. J., Hwa, T., Stocker, R., Cordero, O. X., & Levine, N. M. (2022). Microbes contribute to setting the ocean carbon flux by altering the fate of sinking particulates. Nature Communications, 13(1), 1657. https://doi.org/10.1038/s41467-022-29297-2
- Norris, N., Alcolombri, U., Keegstra, J. M., Yawata, Y., Menolascina, F., Frazzoli, E., Levine, N. M., Fernandez, V. I., & Stocker, R. (2022). Bacterial chemotaxis to saccharides is governed by a trade-off between sensing and uptake. Biophysical Journal, 121(11), 2046–2059. https://doi.org/10.1016/j.bpj.2022.05.003
- Povolo, V. R., D’Souza, G. G., Kaczmarczyk, A., Stubbusch, A. K., Jenal, U., & Ackermann, M. (2022). Extracellular appendages govern spatial dynamics and growth of Caulobacter crescentus on a prevalent biopolymer. bioRxiv, 2022.06.13.495907. https://doi.org/10.1101/2022.06.13.495907
- Preheim, S. P., Boucher, Y., Wildschutte, H., David, L. A., Veneziano, D., Alm, E. J., & Polz, M. F. (2011). Metapopulation structure of Vibrionaceae among coastal marine invertebrates. Environmental Microbiology, 13(1), 265–275. https://doi.org/10.1111/j.1462-2920.2010.02328.x
- Schwartzman, J. A., Ebrahimi, A., Chadwick, G., Sato, Y., Orphan, V., & Cordero, O. X. (2021). Bacterial growth in multicellular aggregates leads to the emergence of complex lifecycles. bioRxiv, 2021.11.01.466752. https://doi.org/10.1101/2021.11.01.466752
- Singh, P. K., Bartalomej, S., Hartmann, R., Jeckel, H., Vidakovic, L., Nadell, C. D., & Drescher, K. (2017). Vibrio cholerae Combines Individual and Collective Sensing to Trigger Biofilm Dispersal. Current Biology, 27(21), 3359-3366.e7. https://doi.org/10.1016/j.cub.2017.09.041
- Ulrich, L. E., Koonin, E. V., & Zhulin, I. B. (2005). One-component systems dominate signal transduction in prokaryotes. Trends in Microbiology, 13(2), 52–56. https://doi.org/10.1016/j.tim.2004.12.006
- Wall, M. E., Hlavacek, W. S., & Savageau, M. A. (2004). Design of gene circuits: Lessons from bacteria. Nature Reviews Genetics, 5(1), 34–42. https://doi.org/10.1038/nrg1244
- Yawata, Y., Carrara, F., Menolascina, F., & Stocker, R. (2020). Constrained optimal foraging by marine bacterioplankton on particulate organic matter. Proceedings of the National Academy of Sciences, 117(41), 25571–25579. https://doi.org/10.1073/pnas.2012443117
- Yawata, Y., Cordero, O. X., Menolascina, F., Hehemann, J.-H., Polz, M. F., & Stocker, R. (2014). Competition–dispersal tradeoff ecologically differentiates recently speciated marine bacterioplankton populations. Proceedings of the National Academy of Sciences, 111(15), 5622–5627. https://doi.org/10.1073/pnas.1318943111
- Zöttl, A., & Yeomans, J. M. (2019). Enhanced bacterial swimming speeds in macromolecular polymer solutions. Nature Physics, 15(6), 554–558. https://doi.org/10.1038/s41567-019-0454-3
-
eLife assessment
This manuscript is a valuable contribution to our understanding of foraging behaviors in marine bacteria. The authors present a conceptual model for how a marine bacterial species consumes an abundant polysaccharide. Using experiments in microfluidic devices and through measurements of motility and gene expression, the authors offer convincing evidence that the degradation products of polysaccharide digestion can stimulate motility.
-
Reviewer #1 (Public Review):
Summary:
The authors attempt to understand how cells forage for spatially heterogeneous complex polysaccharides. They aimed to quantify the foraging behavior and interrogate its genetic basis. The results show that cells aggregate near complex polysaccharides and disperse when simpler byproducts are added. Dispersing cells tend to move towards the polysaccharide. The authors also use transcriptomics to attempt to understand which genes support each of these behaviors - with motility and transporter related genes being highly expressed during dispersal, as expected.
Strengths:
The paper is well written and builds on previous studies by some of the authors showing similar behavior by a different species of bacteria (Caulobacter) on another polysaccharide (xylan). The conceptual model presented at the end encapsulates the findings and provides an interesting hypothesis. I also find the observation of chemotaxis towards the polysaccharide in the experimental conditions interesting.
Weaknesses:
Much of the genetic analysis, as it stands, is quite speculative and descriptive. I found myself confused about many of the genes (e.g., quorum sensing) that pop up enriched during dispersal quite in contrast to my expectations. While the authors do discuss this in the text as worth following up on, I think the analysis as it stands is speculative about the behaviors observed. In the authors' defense, I acknowledge that it might have the potential to generate hypotheses and thus aid future studies.
-
Reviewer #2 (Public Review):
Summary:
The paper sets out to understand the mechanisms underlying the colonization and degradation of marine particles using a natural Vibrio isolate as a model. The data are measurements of motility and gene expressing using microfluidic devices and RNA sequencing. The results reveal that degradation products of alginate do stimulate motility but not chemotaxis. In contrast, alginate itself (the polymer) does stimulate chemotaxis. Further, the dispersal from degrading alginate is density dependent, increasing at higher density. The evidence for these claims are strong. From these the authors propose a narrative (Fig. 6) for growth and dispersal cycles in this system. The idea is that cells colonize and degrade alginate, this degradation stimulates motility and dispersal followed by chemotaxis to a new alginate source. This complete narrative has modest support in the data. A quantitative description of these dynamics awaits future studies.
Strengths:
The microfluidic measurements are the central strength of the paper. The density dependence claim is qualitatively supported by the data. The motility and chemotaxis claims are also well supported by the data. The presentation of the experiment and results are well done. The study serves to motivate a unifying picture of growth and dispersal in marine systems. This is a key process in the global carbon cycle.
Weaknesses:
Perhaps not a weakness, but a glimmer that this is not yet the full story. The RNA expression data show alginate lyase expression in response to digested alginate which is unexpected given the narrative articulated above. Why express lyases while leaving the polymer patch via motility? This question is addressed in the Discussion. A holistic and quantitative picture of the proposed process in Figure 6 awaits additional studies.
-
Reviewer #3 (Public Review):
Summary:
In this manuscript, Stubbusch and coauthors examine the foraging behavior of a marine species consuming an abundant marine polysaccharide. Laboratory experiments in a microfluidic setup are complemented with transcriptomic analyses aiming at assessing the genetic bases of the observed behavior. Bacterial cells consuming the polysaccharide form cohesive aggregates, while start dispersing away when the byproduct of the digestion of the polysaccharide start accumulating. Dispersing cells, tend to be attracted by the polysaccharide. Expression data show that motility genes are enriched during the dispersal phase, as expected. Counterintuitively, in the same phase, genes for transporters and digestions of polysaccharide are also highly expressed.
Strengths:
The manuscript is very well written and easy to follow. The topic is interesting and timely. The genetic analyses provide a new, albeit complex, angle to the study of foraging behaviors in bacteria, adding to previous studies conducted on other species.
Weaknesses:
I find this paper very descriptive and speculative. The results of the genetic analyses are quite counterintuitive; therefore, I understand the difficulty of connecting them to the observations coming from experiments in the microfluidic device. However, they could be better placed in the literature of foraging - dispersal cycles, beyond bacteria. In addition, the interpretation of the results is sometimes confusing.
-
-
www.biorxiv.org www.biorxiv.org
-
Reviewer #1 (Public Review):
The authors effectively delineate the differential distribution and behaviour of MNPs within the heart, noting that these cells can be characterised by their expression levels of csf1ra and mpeg1.1. Key findings include the identification of distinct origins for larval macrophage populations and the sustained presence of csf1ra-expressing cells on the surface of the adult heart. The study examines the embryonic development of these MNPs, revealing that csf1ra+ cells begin populating the heart from embryonic day 3, while mpeg1.1+ cells colonise the heart around day 10, with a significant increase by day 17. Given that the emergence of mpeg1.1+ cells coincides with the reported timing for the onset of haematopoietic stem cell-derived haematopoiesis, the authors combined kaede-lineage tracing experiments and mutant backgrounds to conclude that the earliest tissue-resident macrophages in the heart are derived from primitive haematopoiesis.
The authors also note that the spatial distribution of MNPs varies, with csf1ra+ cells found on the atrium and ventricle surfaces, while mpeg1.1+ cells are initially located on the surface but later distributed throughout the cardiac tissue. Notably, the study demonstrates that tissue-resident macrophages proliferate rapidly following cardiac injury. The authors observe an increased number of proliferating csf1ra+ cells, especially in csf1ra mutant zebrafish, which likely correspond to primitive-derived tissue-resident macrophages that rapidly respond to injury and contribute to the reduced scarring observed in these mutants.
This manuscript makes an important contribution to the field by enhancing our understanding of the ontogeny of tissue-resident macrophages in the heart and their cellular behaviour in a vertebrate model capable of heart regeneration.
Strengths:
This work presents a landmark study on the ontogeny and cellular behaviour of macrophages in the zebrafish heart as it comprehensively examines their development and distribution in both embryonic and adult stages.
One of the key strengths of this study is its thorough cellular description using a range of available genetic tools. By employing transgenic lines to differentiate between a few MNP subtypes, the authors provide a detailed and nuanced understanding of these cells' origins, distribution, and behaviour. This approach allows for high-resolution characterisation of MNP populations, revealing significant insights into their potential role in cardiac homeostasis and regeneration.
Furthermore, the study's findings are significant as they parallel those observed in mouse models, thereby reinforcing the validity and relevance of the zebrafish as a model organism for studying macrophage function in the context of cardiac injury. This comparative aspect underscores the evolutionary conservation of these cellular processes and enhances the study's impact.
Another notable strength is the use of ex vivo imaging techniques, which enable the authors to observe and study the dynamic behaviour of MNPs in heart tissue in real-time. This live imaging capability is crucial for understanding how these cells interact with their environment, particularly in response to cardiac injury. The ability to visualise MNP proliferation and movement provides valuable insights into the mechanisms underlying tissue repair and regeneration.
Weaknesses:
While the manuscript offers significant insights into the ontogeny and behaviour of MNPs in the zebrafish heart, a few limitations described below should be considered:
One potential issue lies in the lineage tracing experiments using the photoconversion Tg(csf1ra:Gal4); Tg(UAS:kaede) line. The authors photoconverted all cardiac tissue macrophages present at 2 days post-fertilisation (dpf) and examined the hearts of these fish at 21 dpf. Although photoconverted macrophages were still observed at 21 dpf, the majority of cells present in the heart at that time were non-photoconverted (cyan) csf1ra+ cells. While this suggests that early-seeded embryonic csf1ra+ macrophages are retained during late larval stages, the contribution of macrophages derived from haematopoietic stem cells (HSCs) might be overestimated. An important concern is that the kaede-converted cells could have proliferated during the embryonic timeframe analysed, thereby diluting and extinguishing the converted kaede protein. This dilution effect could lead to an underestimation of the contribution of primitive embryonic macrophages relative to the HSC-derived cells, resulting in an inaccurate assessment of the proportion of embryonic-derived tissue-resident macrophages over time.
Moreover, the study reports no significant difference in immune cell numbers in the hearts of cmyb-/- mutants, which have normal primitive haematopoiesis but lack HSCs, at 5 dpf. Given the authors' suggestion that mpeg+ cells originate from the HSC wave, it is essential to assess the number of mpeg+ cells in these mutants at later stages. This assessment would clarify whether mpeg+ cells are indeed HSC-derived or if csf1ra+ cells later switch on mpeg expression. Without this additional data, conclusions about the origins of mpeg+ cells remain speculative.
The study's reliance on available genetic tools, while a strength, also introduces limitations. The use of only a few transgenic lines will not fully capture the complexity and diversity of MNP populations, leading to an incomplete understanding of their roles and dynamics.
Furthermore, while the use of ex vivo imaging provides dynamic insights into cell behaviour, it may not fully capture the complexity of in vivo conditions, possibly overlooking interactions and influences present in the living organism.
The manuscript would benefit from increasing the sample sizes to ensure the robustness of the findings. The use of Phalloidin staining to delineate single cells more accurately would also enhance the precision of cell counting and improve the overall data quality.
The study could also benefit from a more in-depth exploration of the functional consequences of MNP heterogeneity in the heart. While the cellular characterisations are thorough, the molecular and regulatory insights provided by the study are limited to a couple of RT-PCRs for some known genes.
Overall, the manuscript by Moyse and colleagues significantly advances our understanding of the ontogeny and behaviour of macrophages in the zebrafish heart, revealing important parallels with mammalian models. However, the points above should be carefully considered when interpreting the results presented in this study.
-
eLife assessment
The manuscript presented by Moyse and colleagues provides valuable insight into the origin, morphology, dynamics, and behavior of several populations of mononuclear phagocytes in the zebrafish heart. The study presents solid evidence through the use of transgenic lines and live imaging, although some limitations related to lineage tracing and molecular profiles should be considered. This work exemplifies the use of zebrafish as a model to study the role of leukocytes in cardiac development and regeneration and potentially draw broader interest to biologists working in immunology fields.
-
Reviewer #2 (Public Review):
In this manuscript, Moyse et. al. investigated the origins and potential functions of distinct populations of mononuclear phagocytes (MNPS) in the heart of developing and adult zebrafish. First, the authors demonstrate that the embryonic zebrafish heart contains macrophages early in development and that mpeg1.1 and csf1ra expressing macrophages vary across time and location and present that cardiac tissue macrophages (cTMs) in the juvenile heart are derived by primitive hematopoiesis. By combining the two transgenes, the authors demonstrate that there are 3 distinct (later determined to be 4) subpopulations of MNPs in adult hearts and that the distribution of these subtypes is distinct within the heart consistent with differing distributions of primitive and definitive macrophages in mammalian hearts. Further analysis of these populations demonstrates distinct morphologies of the subpopulations and analysis of markers conserved in mammals demonstrates distinct expression profiles as well. The authors go on to demonstrate that these subpopulations also demonstrate distinct behaviors via ex-vivo imaging. Lastly, the authors investigated the roles of these subpopulations in a model of cardiac injury in adult zebrafish and demonstrated that primitive-derived cTMs proliferate after injury consistent with mammalian models and that the proliferation of these macrophages likely results in reduced scarring in csf1ra mutants which have reduced recruitment of pro-inflammatory definitive macrophages. The data presented in this manuscript provides solid evidence that zebrafish MNPs behave consistently with those in mammals and further solidifies the use of zebrafish models as a useful tool in studying the role of these cells in cardiac repair and regeneration.
The data presented in this manuscript strongly supports the conclusions made by the authors and utilizes novel techniques. The authors appear to have achieved the goals they set out to investigate. The use of ex-vivo imaging to visualize the movement of these macrophage populations within the heart is especially compelling. The combined use of commonly used transgenic reporters for zebrafish macrophages is a very nice use of existing tools to address new questions and highlight the distinct populations of macrophages. While the overall manuscript is very strong and is likely to have a great impact on the field, there are a few weaknesses that should be addressed prior to acceptance:
(1) The reasoning for N used in many of these experiments is not addressed, nor is the question of the number of times experiments were performed. For purposes of rigor and reproducibility, these questions should be addressed in the methods.
(2) In investigating homologs of zebrafish and mammalian genes, the inclusion of additional classical markers and novel markers of subpopulations highlighted in numerous recent studies using single-cell RNA sequencing would greatly add to the impact.
(3) The description of the RT-PCR experiment is not included in the methods. Detailed methods should be provided including probe sequences. Additionally, a quantitative method of presenting this data would strengthen the conclusions presented here as well as the inclusion of additional markers as discussed previously.
-
Reviewer #3 (Public Review):
In this manuscript, Moyse et al. build on previously published data and investigate several subtypes of mononuclear phagocytes within the larval, juvenile, and adult zebrafish heart. Through the use of mpeg1.1 and csfr1a transgenic lines, the authors characterize the seeding of macrophages in the embryonic and larval heart and describe localization, proportions, morphology, and behavior of several subtypes of mpeg1.1 and csfr1a macrophages in the adult uninjured heart. The authors further provide an analysis of marker gene expression in the differing macrophage subtypes in the uninjured adult heart. Lastly, the authors perform analyses of how these populations respond to cardiac injury and show that csfr1a is important for the proportion and proliferation of these different subtypes of macrophages in the heart.
While the presence of cardiac resident macrophages and their importance in heart regeneration and cardiac disease have been extensively studied in the mouse, the same attention has only recently been given to macrophages in the adult zebrafish heart. This study provides insight into many parallels that exist between resident macrophages in the mouse and zebrafish heart, and while not especially novel, this concept is important for the zebrafish cardiac field. Overall, the conclusions of this study are mostly well supported by the data, but further analysis of marker gene expression in the various macrophage subtypes described would be an important and useful addition for zebrafish researchers studying macrophages in heart regeneration. For example, how are markers of cardiac resident macrophages (described in Wei et al, doi: 10.7554/eLife.84679) expressed in the different mpeg1.1 and csfr1a populations?
-
-
www.medrxiv.org www.medrxiv.org
-
eLife assessment
This paper explores the relationships among evolutionary and epidemiological quantities in influenza, and presents fundamental findings that substantially advance our understanding of the drivers of influenza epidemics. The authors use a rich set of data sources to gather and analyze compelling evidence on the roles of genetic distance, other influenza dynamics and epidemiological indicators in predicting influenza epidemics. The central findings highlight the significant influence of genetic distance on A(H3N2) virus epidemiology and emphasize the role of A(H1N1) virus incidence in shaping A(H3N2) epidemics, suggesting subtype interference as a key factor. This paper also makes relevant data available to the research community.
-
Reviewer #1 (Public Review):
Summary:
The authors aimed to investigate the contribution of antigenic drift in the HA and NA genes of seasonal influenza A(H3N2) virus to their epidemic dynamics. Analyzing 22 influenza seasons before the COVID-19 pandemic, the study explored various antigenic and genetic markers, comparing them against indicators characterizing the epidemiology of annual outbreaks. The central findings highlight the significant influence of genetic distance on A(H3N2) virus epidemiology and emphasize the role of A(H1N1) virus incidence in shaping A(H3N2) epidemics, suggesting subtype interference as a key factor.
Major Strengths:
The paper is well-organized, written with clarity, and presents a comprehensive analysis. The study design, incorporating a span of 22 seasons, provides a robust foundation for understanding influenza dynamics. The inclusion of diverse antigenic and genetic markers enhances the depth of the investigation, and the exploration of subtype interference adds valuable insights.
Major Weaknesses:
While the analysis is thorough, some aspects require deeper interpretation, particularly in the discussion of certain results. Clarity and depth could be improved in the presentation of findings, and minor adjustments are suggested. Furthermore, the evolving dynamics of H3N2 predominance post-2009 need better elucidation.
Comments on revised version:
The authors have addressed each of the comments well. I have no further comments.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public Review):
Summary:
The authors aimed to investigate the contribution of antigenic drift in the HA and NA genes of seasonal influenza A(H3N2) virus to their epidemic dynamics. Analyzing 22 influenza seasons before the COVID-19 pandemic, the study explored various antigenic and genetic markers, comparing them against indicators characterizing the epidemiology of annual outbreaks. The central findings highlight the significant influence of genetic distance on A(H3N2) virus epidemiology and emphasize the role of A(H1N1) virus incidence in shaping A(H3N2) epidemics, suggesting subtype interference as a key factor.
Major Strengths:
The paper is well-organized, written with clarity, and presents a comprehensive analysis. The study design, incorporating a span of 22 seasons, provides a robust foundation for understanding influenza dynamics. The inclusion of diverse antigenic and genetic markers enhances the depth of the investigation, and the exploration of subtype interference adds valuable insights.
Major Weaknesses:
While the analysis is thorough, some aspects require deeper interpretation, particularly in the discussion of certain results. Clarity and depth could be improved in the presentation of findings. Furthermore, the evolving dynamics of H3N2 predominance post-2009 need better elucidation.
Reviewer #2 (Public Review):
Summary: This paper aims to achieve a better understanding of how the antigenic or genetic compositions of the dominant influenza A viruses in circulation at a given time are related to key features of seasonal influenza epidemics in the US. To this end, the authors analyze an extensive dataset with a range of statistical, data science and machine learning methods. They find that the key drivers of influenza A epidemiological dynamics are interference between influenza A subtypes and genetic divergence, relative to the previous one or two seasons, in a broader range of antigenically related sites than previously thought.
Strengths: A thorough investigation of a large and complex dataset.
Weaknesses: The dataset covers a 21 year period which is substantial by epidemiological standards, but quite small from a statistical or machine learning perspective. In particular, it was not possible to follow the usual process and test predictive performance of the random forest model with an independent dataset.
Reviewer #3 (Public Review):
Summary:
This paper explores the relationships among evolutionary and epidemiological quantities in influenza, using a wide range of datasets and features, and using both correlations and random forests to examine, primarily, what are the drivers of influenza epidemics. It's a strong paper representing a thorough and fascinating exploration of potential drivers, and it makes a trove of relevant data readily available to the community.
Strengths:
This paper makes links between epidemiological and evolutionary data for influenza. Placing each in the context of the other is crucial for understanding influenza dynamics and evolution and this paper does a thorough job of this, with many analyses and nuances. The results on the extent to which evolutionary factors relate to epidemic burden, and on interference among influenza types, are particularly interesting. The github repository associated with the paper is clear, comprehensive, and well-documented.
Weaknesses:
The format of the results section can be hard to follow, and we suggest improving readability by restructuring and simplifying in some areas. There are a range of choices made about data preparation and scaling; the authors could explore sensitivity of the results to some of these.
Response to public reviews
We appreciate the positive comments from the reviewers and have implemented or responded to all of the reviewers’ recommendations.
In response to Reviewer 1, we expand on the potential drivers and biological implications of the findings pointed out in their specific recommendations. For example, we now explicitly mention that antigenically distinct 3c.2a and 3c.3a viruses began to co-circulate in 2012 and underwent further diversification during subsequent seasons in our study. We note that, after the 2009 A(H1N1) pandemic, the mean fraction of influenza positive cases typed as A(H3N2) in A(H3N2) dominant seasons is lower compared to A(H3N2) dominant seasons prior to 2009. We propose that the weakening of A(H3N2) predominance may be linked to the diversification of A(H3N2) viruses during the 2010s, wherein multiple antigenically distinct clades with similar fitness circulated in each season, as opposed to a single variant with high fitness.
In response to Reviewer 2, we agree that it would be ideal and best practice to measure model performance with an independent test set, but our dataset includes only ~20 seasons. Predictions of independent test sets of 2-3 seasons had unstable performance, which indicates we do not have sufficient power to measure model performance with a test set this small. In the revised manuscript, we provide more justification and clarification of our methodology. Instead of testing model performance on an independent test set, we use leave-one-season-out cross-validation to train models and measure model performance, wherein each “assessment” set contains one season of data (predicted by the model), and the corresponding “analysis” set (“fold”) contains the remaining seasons. This approach is roughly analogous to splitting data into training and test sets, but all seasons are used at some point in the training of the model (Kuhn & Johnson, 2019).
In response to Reviewer 3, we follow the reviewer’s advice to put the Methods section before the Results section. Concerning Reviewer 3’s question about the sensitivity of our results to data preparation and rescaling, we provide more justification and clarification of our methodology in the revised manuscript. In our study, we adjust influenza type/subtype incidences for differences in reporting between the pre- and post-2009 pandemic periods and across HHS regions. We adjust for differences in reporting between the pre- and post-2009 periods because the US CDC and WHO increased laboratory testing capacity in response to the 2009 A(H1N1) pandemic, which led to substantial, long-lasting improvements to influenza surveillance that are still in place today. Figure 1 - figure supplement 2 shows systematic increases in influenza test volume in all HHS regions after the 2009 pandemic. Given the substantial increase in test volume after 2009, we opted to keep the time trend adjustment for the pre- and post-2009 pandemic periods and evaluate whether adjusting for regional reporting differences affects our results. When estimating univariate correlations between various A(H3N2) epidemic metrics and evolutionary indicators, we found qualitatively equivalent results when adjusting for both pre- and post-2009 pandemic reporting and regional reporting versus only adjusting for the pre- and post-2009 pandemic reporting.
Reviewer #1 (Recommendations For The Authors):
Specific comments:
(1) Line 155-156. Request for a reference for: "Given that protective immunity wanes after 1-4 years"
We now include two references (He et al. 2015 and Wraith et al. 2022), which were cited at the beginning of the introduction when referring to the duration of protective immunity for antigenically homologous viruses. (Lines 640-642 in revised manuscript)
(2) Line 162-163: Request a further explanation of the negative correlation between seasonal diversity of HA and NA LBI values and NA epitope distance. Clarify biological implications to aid reader understanding.
In the revised manuscript we expand on the biological implications of A(H3N2) virus populations characterized by high antigenic novelty and low LBI diversity.
Lines 649-653:
“The seasonal diversity of HA and NA LBI values was negatively correlated with NA epitope distance (Figure 2 – figure supplements 5 – 6), with high antigenic novelty coinciding with low genealogical diversity. This association suggests that selective sweeps tend to follow the emergence of drifted variants with high fitness, resulting in seasons dominated by a single A(H3N2) variant rather than multiple cocirculating clades.”
(3) Figure S3 legend t-2 may be marked as t-1.
Thank you for catching this. We have fixed this typo. Note: Figure S3 is now Figure 2 – figure supplement 5.
(4) Lines 201-214. The key takeaways from the analysis of subtype dominance are ultimately not clear. It also misses the underlying dynamics that H3N2 predominance following an evolutionary change has waned since 2009.
In the revised manuscript we elaborate on key takeaways concerning the relationship between antigenic drift and A(H3N2) dominance. We also add a caveat noting that A(H3N2) predominance is weaker during the post-2009 period, which may be linked to the diversification of A(H3N2) lineages after 2012. We do not know of a reference that links the diversification of A(H3N2) viruses in the 2010s to a particular evolutionary change. Therefore, we do not attribute the diversification of A(H3N2) viruses to a specific evolutionary change in A(H3N2) variants circulating at the time (A/Perth/16/2009-like strains (PE09)). Instead, we allude to the potential role of A(H3N2) diversification in creating multiple co-circulating lineages that may have less of a fitness advantage.
Lines 681-703:
“We explored whether evolutionary changes in A(H3N2) may predispose this subtype to dominate influenza virus circulation in a given season. A(H3N2) subtype dominance – the proportion of influenza positive samples typed as A(H3N2) – increased with H3 epitope distance (t – 2) (R2 = 0.32, P = 0.05) and N2 epitope distance (t – 1) (R2 = 0.34, P = 0.03) (regression results: Figure 4; Spearman correlations: Figure 3 – figure supplement 1). Figure 4 illustrates this relationship at the regional level across two seasons in which A(H3N2) was nationally dominant, but where antigenic change differed. In 2003-2004, we observed widespread dominance of A(H3N2) viruses after the emergence of the novel antigenic cluster, FU02 (A/Fujian/411/2002-like strains). In contrast, there was substantial regional heterogeneity in subtype circulation during 2007-2008, a season in which A(H3N2) viruses were antigenically similar to those circulating in the previous season. Patterns in type/subtype circulation across all influenza seasons in our study period are shown in Figure 4 – figure supplement 1. As observed for the 2003-2004 season, widespread A(H3N2) dominance tended to coincide with major antigenic transitions (e.g.,
A/Sydney/5/1997 (SY97) seasons, 1997-1998 to 1999-2000; A/California/7/2004 (CA04) season, 20042005), though this was not universally the case (e.g., A/Perth/16/2009 (PE09) season, 2010-2011).
After the 2009 A(H1N1) pandemic, A(H3N2) dominant seasons still occurred more frequently than A(H1N1) dominant seasons, but the mean fraction of influenza positive cases typed as A(H3N2) in A(H3N2) dominant seasons was lower compared to A(H3N2) dominant seasons prior to 2009. Antigenically distinct 3c.2a and 3c.3a viruses began to co-circulate in 2012 and underwent further diversification during subsequent seasons in our study (https://nextstrain.org/seasonal-
flu/h3n2/ha/12y@2024-05-13) (Dhanasekaran et al., 2022; Huddleston et al., 2020; Yan et al., 2019). The decline in A(H3N2) predominance during the post-2009 period may be linked to the genetic and antigenic diversification of A(H3N2) viruses, wherein multiple lineages with similar fitness co-circulated in each season.”
(5) Line 253-255: It would be beneficial to provide a more detailed interpretation of the statement that "pre-2009 seasonal A(H1N1) viruses may limit the circulation of A(H3N2) viruses to a greater extent than A(H1N1)pdm09 viruses." Elaborate on the cause-and-effect relationship within this statement.
In the revised manuscript we suggest that seasonal A(H1N1) viruses may interfere with the circulation of A(H3N2) viruses to a greater extent than A(H1N1)pdm09 viruses, because seasonal A(H1N1) viruses and A(H3N2) are more closely related, and thus may elicit stronger cross-reactive T cell responses.
Lines 738-745:
“The internal gene segments NS, M, NP, PA, and PB2 of A(H3N2) viruses and pre-2009 seasonal A(H1N1) viruses share a common ancestor (Webster et al., 1992) whereas A(H1N1)pdm09 viruses have a combination of gene segments derived from swine and avian reservoirs that were not reported prior to the 2009 pandemic (Garten et al., 2009; Smith et al., 2009). Non-glycoprotein genes are highly conserved between influenza A viruses and elicit cross-reactive antibody and T cell responses (Grebe et al., 2008; Sridhar, 2016). Because pre-2009 seasonal A(H1N1) viruses and A(H3N2) are more closely related, we hypothesized that seasonal A(H1N1) viruses could potentially limit the circulation of A(H3N2) viruses to a greater extent than A(H1N1)pdm09 viruses, due to greater T cell-mediated cross-protective immunity.”
(6) In the results section, many statements report statistical results of correlation analyses. Consider providing further interpretations of these results, such as the implications of nonsignificant correlations and how they support or contradict the hypothesis or previous studies. For example, the statement on line 248 regarding the lack of significant correlation between influenza B epidemic size and A(H3N2) epidemic metrics would benefit from additional discussion on what this non-significant correlation signifies and how it relates to the hypothesis or previous research.
In the Discussion section, we suggest that the lack of an association between influenza B circulation and A(H3N2) epidemic metrics is due to few T and B cell epitopes shared between influenza A and B viruses (Terajima et al., 2013).
Lines 1005-1007 in revised manuscript (Lines 513-515 in original manuscript):
“Overall, we did not find any indication that influenza B incidence affects A(H3N2) epidemic burden or timing, which is not unexpected, given that few T and B cell epitopes are shared between the two virus types (Terajima et al., 2013).”
Minor comments:
(1) Line 116-122: Include a summary statistical description of all collected data sets, detailing the number of HA and NA sequence data and their sources. Briefly describe subsampled data sets, specifying preferences (e.g., the number of HA or NA sequence data collected from each region).
In our revised manuscript we now include supplementary tables that summarize the number of A/H3 and
A/N2 sequences in each subsampled dataset, aggregated by world region, for all seasons combined (Figure 2 - table supplements 1 - 2). We also include supplementary figures showing the number of sequences collected in each month and each season in North America versus the other nine world regions combined (Figure 2 - figure supplements 1 - 2). Subsampled datasets are plotted individually in the figures below but individual time series are difficult to discern due to minor differences in sequence counts across the datasets.
(2) Figure 7A: Due to space limitations, consider rounding numbers on the x-axis to whole numbers for clarity.
Thank you for this suggestion. In the revised manuscript we round numbers in the axes of Figure 7A (Figure 9A in the revised manuscript) so that the axes are less crowded.
(3) Figure 4C & Figure 4D: Note that Region 10 (purple) data were unavailable for seasons before 2009 (lines 1483-1484). Label each region on the map with its respective region number (1 to 10) and indicate this in the legend for easy identification.
In our original submission, the legend for Figure 4 included “Data for Region 10 (purple) were not available for seasons prior to 2009” at the end of the caption. We have moved this sentence, as well as other descriptions that apply to both C and D, so that they follow the sentence “C-D. Regional patterns of influenza type and subtype incidence during two seasons when A(H3N2) was nationally dominant.”
In our revised manuscript, Figure 4, and Figure 4 - figure supplement 1 (Figure S10 in original submission) include labels for each HHS region.
We did not receive specific recommendations from Reviewer #2. However, our responses to Reviewer #3 addresses the study’s weaknesses mentioned by Reviewer #2.
Reviewer #3 (Recommendations For The Authors):
This paper explores the relationships among evolutionary and epidemiological quantities in influenza, using a wide range of datasets and features, and using both correlations and random forests to examine, primarily, what are the drivers of influenza epidemics.
This is a work horse of paper, in the volumes of data that are analyzed and the extensive analysis that is done. The data that are provided are a treasure trove resource for influenza modelers and for anyone interested in seeing influenza surveillance data in the context of evolution, and evolutionary information in the context of epidemiology.
L53 - end of sentence "and antigenic drift": not sure this fits, explain? I thought this sentence was in contrast to antigenic drift.
Thank you for catching this. We did not intend to include “and antigenic drift” at the end of this sentence and have removed it (Line 59).
Para around L115: would using primarily US data be a limitation, because it's global immunity that shapes success of strains? Or, how much does each country's immunity and vaccination and so on actually shape what strains succeed there, compared to global/international factors?
The HA and NA phylogenetic trees in our study are enriched with US sequences because our study focuses on epidemiological dynamics in the US, and we wanted to prioritize A(H3N2) viruses that the US human population encountered in each season. We agree with the reviewer that the world population may be the right scale to understand how immunity, acquired by vaccination or natural infection, may shape the emergence and success of new lineages that will go on to circulate globally. However, our study assesses the overall impact of antigenic drift on regional A(H3N2) epidemic dynamics in the US. In other words, our driving question is whether we can predict the population-level impact of an A(H3N2) variant in the US, conditional on this particular lineage having established in the US and circulating at relatively high levels. We do not assess the global or population-level factors that may influence which A(H3N2) virus lineages are successful in a given location or season.
We have added a clarifying sentence to the end of the Introduction to narrow the scope of the paper for the reader.
Line 114-116: “Rather than characterize in situ evolution of A(H3N2) lineages circulating in the U.S., we study the epidemiological impacts of antigenic drift once A(H3N2) variants have arrived on U.S. soil and managed to establish and circulate at relatively high levels.”
In the Results section, I found the format hard to follow, because of the extensive methodological details, numbers with CIs and long sentences. Sentences sometimes included the question, definitions of variables, and lists. For example at line 215 we have: "Next, we tested for associations between A(H3N2) evolution and epidemic timing, including onset week, defined as the winter changepoint in incidence [16], and peak week, defined as the first week of maximum incidence; spatiotemporal synchrony, measured as the variation (standard deviation, s.d.) in regional onset and peak timing; and epidemic speed, including seasonal duration and the number of weeks from onset to peak (Table 2, Figure S11)". I would suggest putting the methods section first, using shorter sentences, separating lists from the question being asked, and stating what was found without also putting in all the extra detail. Putting the methods section before the results might reduce the sense that you have to explain what you did and how in the results section too.
Thank you for suggesting how to improve the readability of the Results section. In the revised manuscript, we follow the reviewer’s advice to put the Methods section before the Results section. Although eLife formatting requirements specify the order: Introduction, Results, Discussion, and Methods, the journal allows for the Methods section to follow the Introduction when it makes sense to do so. We agree with the reviewer that putting the Methods section before the Results section makes our results easier to follow because we no longer need to introduce methodological details at the beginning of each set of results.
L285 in the RF you remove variables without significant correlations with the target variables, but isn't one of the aims of RF to uncover relationships where a correlation might not be evident, and in part to reveal combinations of features that give the targeted outcome? Also with the RF, I am a bit concerned that you could not use the leave-one-out approach because it was "unstable" - presumably that means that you obtain quite different results if you leave out a season. How robust are these results, and what are the most sensitive aspects? Are the same variables typically high in importance if you leave out a season, for example? What does the scatterplot of observed vs predicted epidemic size (as in Fig 7) look like if each prediction is for the one that was left out (i.e. from a model trained on all the rest)? In my experience, where the RF is "unstable", that can look pretty terrible even if the model trained on all the data looks great (as does Figure 7). In any case I think it's worth discussing sensitivity.
(1) In response to the reviewer’s first question, we explain our rationale for not including all candidate predictors in random forest and penalized regression models.
Models trained with different combinations of predictors can have similar performance, and these combinations of predictors can include variables that do not necessarily have strong univariate associations with the target variable. The performance of random forest and LASSO regression models are not sensitive to redundant or irrelevant predictors (see Figure 10.2 in Kuhn & Johnson, 2019). However, if our goal is variable selection rather than strictly model performance, it is considered best practice to remove collinear, redundant, and/or irrelevant variables prior to training models (see section 11.3 in Kuhn & Johnson, 2019). In both random forest and LASSO regression models, if there are highly collinear variables that are useful for predicting the target variable, the predictor chosen by the model becomes a random selection. In random forest models, these highly collinear variables will be used in all splits across the forest of decision trees, and this redundancy dilutes variable importance scores. Thus, failing to minimize multicollinearity prior to model training could result in some variables having low rankings and the appearance of being unimportant, because their importance scores are overshadowed by those of the highly correlated variables. Our rationale for preprocessing predictor data follows the philosophy of Kuhn & Johnson, 2019, who recommend including the minimum possible set of variables that does not compromise model performance. Even if a particular model is insensitive to extra predictors, Kuhn and John explain that “removing predictors can reduce the cost of acquiring data or improve the throughput of the software used to make predictions.”
In the revised manuscript, we include more details about our steps for preprocessing predictor data. We also follow the reviewer’s suggestion to include all evolutionary predictors in variable selection analyses, regardless of whether they have strong univariate correlations with target outcomes, because the performance of random forest and LASSO regression models is not affected by redundant predictors.
Including additional predictors in our variable selection analyses does not change our conclusions. As reported in our original manuscript, predictors with strong univariate correlations with various epidemic metrics were the highest ranked features in both random forest and LASSO regression models.
Lines 523-563:
“Preprocessing of predictor data: The starting set of candidate predictors included all viral fitness metrics: genetic and antigenic distances between current and previously circulating strains and the standard deviation and Shannon diversity of H3 and N2 LBI values in the current season. To account for potential type or subtype interference, we included A(H1N1) or A(H1N1)pdm09 epidemic size and B epidemic size in the current and prior season and the dominant IAV subtype in the prior season (Lee et al., 2018). We included A(H3N2) epidemic size in the prior season as a proxy for prior natural immunity to A(H3N2). To account for vaccine-induced immunity, we considered four categories of predictors and included estimates for the current and prior seasons: national vaccination coverage among adults (18-49 years coverage × ≥ 65 years coverage), adjusted A(H3N2) vaccine effectiveness (VE), a combined metric of vaccination coverage and A(H3N2) VE (18-49 years coverage × ≥ 65 years coverage × VE), and H3 and N2 epitope distances between naturally circulating A(H3N2) viruses and the U.S. A(H3N2) vaccine strain in each season. We could not include a predictor for vaccination coverage in children or consider cladespecific VE estimates, because these data were not available for most seasons in our study.
Random forest and LASSO regression models are not sensitive to redundant (highly collinear) features (Kuhn & Johnson, 2019), but we chose to downsize the original set of candidate predictors to minimize the impact of multicollinearity on variable importance scores. For both types of models, if there are highly collinear variables that are useful for predicting the target variable, the predictor chosen by the model becomes a random selection (Kuhn & Johnson, 2019). In random forest models, these highly collinear variables will be used in all splits across the forest of decision trees, and this redundancy dilutes variable importance scores (Kuhn & Johnson, 2019). We first confirmed that none of the candidate predictors had zero variance or near-zero variance. Because seasonal lags of each viral fitness metric are highly collinear, we included only one lag of each evolutionary predictor, with a preference for the lag that had the strongest univariate correlations with various epidemic metrics. We checked for multicollinearity among the remaining predictors by examining Spearman’s rank correlation coefficients between all pairs of predictors. If a particular pair of predictors was highly correlated (Spearman’s 𝜌 > 0.8), we retained only one predictor from that pair, with a preference for the predictor that had the strongest univariate correlations with various epidemic metrics. Lastly, we performed QR decomposition of the matrix of remaining predictors to determine if the matrix is full rank and identify sets of columns involved in linear dependencies. This step did not eliminate any additional predictors, given that we had already removed pairs of highly collinear variables based on Spearman correlation coefficients.
After these preprocessing steps, our final set of model predictors included 21 variables, including 8 viral evolutionary indicators: H3 epitope distance (t – 2), HI log2 titer distance (t – 2), H3 RBS distance (t – 2), H3 non-epitope distance (t – 2), N2 epitope distance (t – 1), N2 non-epitope distance (t – 1), and H3 and N2 LBI diversity (s.d.) in the current season; 6 proxies for type/subtype interference and prior immunity:
A(H1N1) and B epidemic sizes in the current and prior season, A(H3N2) epidemic size in the prior season, and the dominant IAV subtype in the prior season; and 7 proxies for vaccine-induced immunity: A(H3N2) VE in the current and prior season, H3 and N2 epitope distances between circulating strains and the vaccine strain in each season, the combined metric of adult vaccination coverage × VE in the current and prior season, and adult vaccination coverage in the prior season.”
(2) Next, we clarify our model training methodology to address the reviewer’s second point about using a leave-one-out cross-validation approach.
We believe the reviewer is mistaken; we use a leave-one-season-out validation approach which lends some robustness to the predictions. In our original submission, we stated “We created each forest by generating 3,000 regression trees from 10 repeats of a leave-one-season-out (jackknife) cross-validated sample of the data. Due to the small size of our dataset, evaluating the predictive accuracy of random forest models on a quasi-independent test set produced unstable estimates.” (Lines 813-816 in the original manuscript)
To clarify, we use leave-one-season-out cross-validation to train models and measure model performance, wherein each “assessment” set contains one season of data (predicted by the model), and the corresponding “analysis” set (“fold”) contains the remaining seasons. This approach is roughly analogous to splitting data into training and test sets, but all seasons are used at some point in the training of the model (see Section 3.4 in Kuhn & Johnson, 2019). To reduce noise, we generated 10 bootstrap resamples of each fold and averaged the RMSE and R2 values of model predictions from resamples.
Although it would be ideal and best practice to measure model performance with an independent test set, our dataset includes only ~20 seasons. We found that predictions of independent test sets of 2-3 seasons had unstable performance, which indicates we do not have sufficient power to measure model performance with a test set this small. Further, we suspect that large antigenic jumps in a small subset of seasons further contribute to variation in prediction accuracy across randomly selected test sets. Our rationale for using cross-validation instead of an independent test set is best described in Section 4.3 of Kuhn and Johnson’s book “Applied Predictive Modeling” (Kuhn & Johnson, 2013):
“When the number of samples is not large, a strong case can be made that a test set should be avoided because every sample may be needed for model building. Additionally, the size of the test set may not have sufficient power or precision to make reasonable judgements. Several researchers (Molinaro 2005; Martin and Hirschberg 1996; Hawkins et al. 2003) show that validation using a single test set can be a poor choice. Hawkins et al. (2003) concisely summarize this point: “holdout samples of tolerable size [...] do not match the cross-validation itself for reliability in assessing model fit and are hard to motivate. “Resampling methods, such as cross-validation, can be used to produce appropriate estimates of model performance using the training set. These are discussed in length in Sect.4.4. Although resampling techniques can be misapplied, such as the example shown in Ambroise and McLachlan (2002), they often produce performance estimates superior to a single test set because they evaluate many alternate versions of the data.”
In our revised manuscript, we provide additional clarification of our methods (Lines 574-590):
“We created each forest by generating 3,000 regression trees. To determine the best performing model for each epidemic metric, we used leave-one-season-out (jackknife) cross-validation to train models and measure model performance, wherein each “assessment” set is one season of data predicted by the model, and the corresponding “analysis” set contains the remaining seasons. This approach is roughly analogous to splitting data into training and test sets, but all seasons are used at some point in the training of each model (Kuhn & Johnson, 2019). Due to the small size of our dataset (~20 seasons), evaluating the predictive accuracy of random forest models on a quasi-independent test set of 2-3 seasons produced unstable estimates. Instead of testing model performance on an independent test set, we generated 10 bootstrap resamples (“repeats”) of each analysis set (“fold”) and averaged the predictions of models trained on resamples (Kuhn & Johnson, 2013, 2019). For each epidemic metric, we report the mean root mean squared error (RMSE) and R2 of predictions from the best tuned model. We used permutation importance (N = 50 permutations) to estimate the relative importance of each predictor in determining target outcomes. Permutation importance is the decrease in prediction accuracy when a single feature (predictor) is randomly permuted, with larger values indicating more important variables. Because many features were collinear, we used conditional permutation importance to compute feature importance scores, rather than the standard marginal procedure (Altmann et al., 2010; Debeer & Strobl, 2020; Strobl et al., 2008; Strobl et al., 2007).”
(3) In response to the reviewer’s question about the sensitivity of results when one season is left out, we clarify that the variable importance scores in Figure 8 and model predictions in Figure 9 were generated by models tuned using leave-one-season-out cross-validation.
As explained above, in our leave-one-season-out cross-validation approach, each “assessment” set contains one season of data predicted by the model, and the corresponding “analysis” set (“fold”) contains the remaining seasons. We generated predictions of epidemic metrics and variable importance rankings by averaging the model output of 10 bootstrap resamples of each cross-validation fold.
In Lines 791-806, we describe which epidemic metrics have the highest prediction accuracy and report that random forest models tend to underpredict most epidemic metrics in seasons with high antigenic novelty:
“We measured correlations between observed values and model-predicted values at the HHS region level. Among the various epidemic metrics, random forest models produced the most accurate predictions of A(H3N2) subtype dominance (Spearman’s 𝜌 = 0.95, regional range = 0.85 – 0.97), peak incidence (𝜌 = 0.91, regional range = 0.72 – 0.95), and epidemic size (𝜌 = 0.9, regional range = 0.74 – 0.95), while predictions of effective 𝑅! and epidemic intensity were less accurate (𝜌 = 0.81, regional range = 0.65 – 0.91; 𝜌 = 0.78, regional range = 0.63 – 0.92, respectively) (Figure 9). Random forest models tended to underpredict most epidemic targets in seasons with substantial H3 antigenic transitions, in particular the SY97 cluster seasons (1998-1999, 1999-2000) and the FU02 cluster season (2003-2004) (Figure 9).
For epidemic size and peak incidence, seasonal predictive error – the root-mean-square error (RMSE) across all regional predictions in a season – increased with H3 epitope distance (epidemic size, Spearman’s 𝜌 = 0.51, P = 0.02; peak incidence, 𝜌 = 0.63, P = 0.004) and N2 epitope distance (epidemic size, 𝜌 = 0.48, P = 0.04; peak incidence, 𝜌 = 0.48, P = 0.03) (Figure 9 – figure supplements 1 – 2). For models of epidemic intensity, seasonal RMSE increased with N2 epitope distance (𝜌 = 0.64, P = 0.004) but not H3 epitope distance (𝜌 = 0.06, P = 0.8) (Figure 9 – figure supplements 1 – 2). Seasonal RMSE of effective 𝑅! and subtype dominance predictions did not correlate with H3 or N2 epitope distance (Figure 9 – figure supplements 1 – 2).”
I think the competition (interference) results are really interesting, perhaps among the most interesting aspects of this work.
Thank you! We agree that our finding that subtype interference has a greater impact than viral evolution on A(H3N2) epidemics is one of the more interesting results in the study.
Have you seen the paper by Barrat-Charlaix et al? They found that LBI was not good predicting frequency dynamics (see https://pubmed.ncbi.nlm.nih.gov/33749787/); instead, LBI was high for sequences like the consensus sequence, which was near to future strains. LBI also was not positively correlated with epidemic impact in Figure S7.
The local branching index (LBI) measures the rate of recent phylogenetic branching and approximates relative fitness among viral clades, with high LBI values representing greater fitness (Neher et al. 2014).
Two of this study’s co-authors (John Huddleston and Trevor Bedford) are also co-authors of BarratCharlaix et al. 2021. Barrat-Charlaix et al. 2021 assessed the performance of LBI in predicting the frequency dynamics and fixation of individual amino acid substitutions in A(H3N2) viruses. Our study is not focused on predicting the future success of A(H3N2) clades or the frequency dynamics or probability of fixation of individual substitutions. Instead, we use the standard deviation and Shannon diversity of LBI values in each season as a proxy for genealogical (clade-level) diversity. We find that, at a seasonal level, low diversity of H3 or N2 LBI values in the current season correlates with greater epidemic intensity, higher transmission rates, and shorter seasonal duration.
In the Discussion we provide an explanation for these correlation results (Lines 848-857):
“The local branching index (LBI) is traditionally used to predict the success of individual clades, with high LBI values indicating high viral fitness (Huddleston et al., 2020; Neher et al., 2014). In our epidemiological analysis, low diversity of H3 or N2 LBI in the current season correlated with greater epidemic intensity, higher transmission rates, and shorter seasonal duration. These associations suggest that low LBI diversity is indicative of a rapid selective sweep by one successful clade, while high LBI diversity is indicative of multiple co-circulating clades with variable seeding and establishment times over the course of an epidemic. A caveat is that LBI estimation is more sensitive to sequence sub-sampling schemes than strain-level measures. If an epidemic is short and intense (e.g., 1-2 months), a phylogenetic tree with our sub-sampling scheme (50 sequences per month) may not incorporate enough sequences to capture the true diversity of LBI values in that season.”
Figure 1 - LBI goes up over time. Is that partly to do with sampling? Overall how do higher sampling volumes in later years impact this analysis? (though you choose a fixed number of sequences so I guess you downsample to cope with that). I note that LBI is likely to be sensitive to sequencing density.
Thank you for pointing this out. We realized that increasing LBI Shannon diversity over the course of the study period was indeed an artefact of increasing sequence volume over time. Our sequence subsampling scheme involves selecting a random sample of up to 50 viruses per month, with up to 25 viruses selected from North America (if available) and the remaining sequences evenly divided across nine other global regions. In early seasons of the study (late 1990s/early 2000s), sampling was often too sparse to meet the 25 viruses/month threshold for North America or for the other global regions combined (H3: Figure 2 - figure supplement 1; N2: Figure 2 - figure supplement 2). Ecological diversity metrics are sensitive to sample size, which explains why LBI Shannon diversity appeared to steadily increase over time in our original submission. In our revised manuscript, we correct for uneven sample sizes across seasons before estimating Shannon diversity and clarify our methodology.
Lines 443-482:
“Clade growth: The local branching index (LBI) measures the relative fitness of co-circulating clades, with high LBI values indicating recent rapid phylogenetic branching (Huddleston et al., 2020; Neher et al., 2014). To calculate LBI for each H3 and N2 sequence, we applied the LBI heuristic algorithm as originally described by Neher et al., 2014 to H3 and N2 phylogenetic trees, respectively. We set the neighborhood parameter 𝜏 to 0.4 and only considered viruses sampled between the current season 𝑡 and the previous season 𝑡 – 1 as contributing to recent clade growth in the current season 𝑡.
Variation in the phylogenetic branching rates of co-circulating A(H3N2) clades may affect the magnitude, intensity, onset, or duration of seasonal epidemics. For example, we expected that seasons dominated by a single variant with high fitness might have different epidemiological dynamics than seasons with multiple co-circulating clades with varying seeding and establishment times. We measured the diversity of clade growth rates of viruses circulating in each season by measuring the standard deviation (s.d.) and Shannon diversity of LBI values in each season. Given that LBI measures relative fitness among cocirculating clades, we did not compare overall clade growth rates (e.g., mean LBI) across seasons.
Each season’s distribution of LBI values is right-skewed and does not follow a normal distribution. We therefore bootstrapped the LBI values of each season in each replicate dataset 1000 times (1000 samples with replacement) and estimated the seasonal standard deviation of LBI from resamples, rather than directly from observed LBI values. We also tested the seasonal standard deviation of LBI from log transformed LBI values, which produced qualitatively equivalent results to bootstrapped LBI values in downstream analyses.
As an alternative measure of seasonal LBI diversity, we binned raw H3 and N2 LBI values into categories based on their integer values (e.g., an LBI value of 0.5 is assigned to the (0,1] bin) and estimated the exponential of the Shannon entropy (Shannon diversity) of LBI categories (Hill, 1973; Shannon, 1948). The Shannon diversity of LBI considers both the richness and relative abundance of viral clades with different growth rates in each season and is calculated as follows:
where 𝑞 𝐷 is the effective number of categories or Hill numbers of order 𝑞 (here, clades with different growth rates), with 𝑞 defining the sensitivity of the true diversity to rare versus abundant categories (Hill,
1973). exp is the exponential function, 𝑝# is the proportion of LBI values belonging to the 𝑖th category, and 𝑅 is richness (the total number of categories). Shannon diversity 1𝐷 (𝑞 = 1) estimates the effective number of categories in an assemblage using the geometric mean of their proportional abundances 𝑝# (Hill, 1973).
Because ecological diversity metrics are sensitive to sampling effort, we rarefied H3 and N2 sequence datasets prior to estimating Shannon diversity so that seasons had the same sample size. For each season in each replicate dataset, we constructed rarefaction and extrapolation curves of LBI Shannon diversity and extracted the Shannon diversity estimate of the sample size that was twice the size of the reference sample size (the smallest number of sequences obtained in any season during the study) (iNEXT R package) (Chao et al., 2014). Chao et al. found that their diversity estimators work well for rarefaction and short-range extrapolation when the extrapolated sample size is up to twice the reference sample size. For H3, we estimated seasonal diversity using replicate datasets subsampled to 360 sequences/season; For N2, datasets were subsampled to 230 sequences/season.”
Estimating the Shannon diversity of LBI from datasets with even sampling across seasons removes the previous secular trend of increasing LBI diversity over time (Figure 2 in revised manuscript).
Figure 3 - I wondered what about the co-dominant times?
In Figure 3, orange points correspond to seasons in which A(H3N2) and A(H1N1) were codominant. We are not sure of the reviewer’s specific question concerning codominant seasons, but if it concerns whether antigenic drift is linked to epidemic magnitude among codominant seasons alone, we cannot perform separate regression analyses for these seasons because there are only two codominant seasons during the 22 season study period.
Figure 4 - Related to drift and epidemic size, dominance, etc. -- when is drift measured, and (if it's measured in season t), would larger populations create more drift, simply by having access to more opportunity (via a larger viral population size)? This is a bit 'devil's advocate' but what if some epidemiological/behavioural process causes a larger and/or later peak, and those gave rise to higher drift?
Seasonal drift is measured as the genetic or antigenic distance between viruses circulating during season t and viruses circulating in the prior season (𝑡 – 1) or two seasons ago (𝑡 – 2).
Concerning the question about whether larger human populations lead to greater rates of antigenic drift, phylogeographic studies have repeatedly found that East-South-Southeast Asia are the source populations for A(H3N2) viruses (Bedford et al., 2015; Lemey et al., 2014), in part because these regions have tropical or subtropical climates and larger human populations, which enable year-round circulation and higher background infection rates. Larger viral populations (via larger host population sizes) and uninterrupted transmission may increase the efficiency of selection and the probability of strain survival and global spread (Wen et al., 2016). After A(H3N2) variants emerge in East-South-Southeast Asia and spread to other parts of the world, A(H3N2) viruses circulate via overlapping epidemics rather than local persistence (Bedford et al., 2015; Rambaut et al., 2008). Each season, A(H3N2) outbreaks in the US (and other temperate regions) are seeded by case importations from outside the US, genetic diversity peaks during the winter, and a strong genetic bottleneck typically occurs at the end of the season (Rambaut et al., 2008).
Due to their faster rates of antigenic evolution, A(H3N2) viruses undergo more rapid clade turnover and dissemination than A(H1N1) and B viruses, despite similar global migration networks across A(H3N2), A(H1N1), and B viruses (Bedford et al., 2015). Bedford et al. speculate that there is typically little geographic differentiation in A(H3N2) viruses circulating in each season because A(H3N2) viruses tend to infect adults, and adults are more mobile than children. Compared to A(H3N2) viruses, A(H1N1) and B viruses tend to have greater genealogical diversity, geographic differentiation, and longer local persistence times (Bedford et al., 2015; Rambaut et al., 2008). Thus, some A(H1N1) and B epidemics are reseeded by viruses that have persisted locally since prior epidemics (Bedford et al., 2015).
Theoretical models have shown that epidemiological processes can influence rates of antigenic evolution (Recker et al., 2007; Wen et al., 2016; Zinder et al., 2013), though the impact of flu epidemiology on viral evolution is likely constrained by the virus’s intrinsic mutation rate.
In conclusion, larger host population sizes and flu epidemiology can indeed influence rates of antigenic evolution. However, given that our study is US-centric and focuses on A(H3N2) viruses, these factors are likely not at play in our study, due to intrinsic biological characteristics of A(H3N2) viruses and the geographic location of our study.
We have added a clarifying sentence to the end of the Introduction to narrow the scope of the paper for the reader.
Line 114-116: “Rather than characterize in situ evolution of A(H3N2) lineages circulating in the U.S., we study the epidemiological impacts of antigenic drift once A(H3N2) variants have arrived on U.S. soil and managed to establish and circulate at relatively high levels.”
Methods --
L 620 about rescaling and pre- vs post-pandemic times : tell us more - how has reporting changed? could any of this not be because of reporting but because of NPIs or otherwise? Overall there is a lot of rescaling going on. How sensitive are the results to it?
it would be unreasonable to ask for a sensitivity analysis for all the results for all the choices around data preparation, but some idea where there is a reason to think there might be a dependence on one of these choices would be great.
In response to the 2009 A(H1N1) pandemic, the US CDC and WHO increased laboratory testing capacity and strengthened epidemiological networks, leading to substantial, long-lasting improvements to influenza surveillance that are still in place today (https://www.cdc.gov/flu/weekly/overview.htm). At the beginning of the COVID-19 pandemic, influenza surveillance networks were quickly adapted to detect and understand the spread of SARS-CoV-2. The 2009 pandemic occurred over a time span of less than one year, and strict non-pharmaceutical interventions (NPIs), such as lockdowns and mask mandates, were not implemented. Thus, we attribute increases in test volume during the post-2009 period to improved virologic surveillance and laboratory testing capacity rather than changes in care-seeking behavior. In the revised manuscript, we include a figure (Figure 1 - figure supplement 2) that shows systematic increases in test volume in all HHS regions after the 2009 pandemic.
Given the substantial increase in influenza test volume after 2009, we opted to keep the time trend adjustment for the pre- and post-2009 pandemic periods and evaluate whether adjusting for regional reporting differences affects our results. When estimating univariate correlations between various
A(H3N2) epidemic metrics and evolutionary indicators, we found qualitatively equivalent results for Spearman correlations and regression models, when adjusting for the pre- and post-2009 pandemic time periods and regional reporting versus only adjusting for the pre-/post-2009 pandemic time periods. Below, we share adjusted versions of Figure 3 (regression results) and Figure 3 - figure supplement 1 (Spearman correlations). Each figure only adjusts for differences in pre- and post-2009 pandemic reporting.
Author response image 1.
Adjustment for pre- and post-2009 pandemic only
Author response image 2.
Adjustment for pre- and post-2009 pandemic only
L635 - Why discretize the continuous LBI distribution and then use Shannon entropy when you could just use the variance and/or higher moments? (or quantiles)? Similarly, why not use the duration of the peak, rather than Shannon entropy? (though there, because presumably data are already binned weekly, and using duration would involve defining start and stop times, it's more natural than with LBI)
We realize that we failed to mention in the methods that we calculated the standard deviation of LBI in each season, in addition to the exponential of the Shannon entropy (Shannon diversity) of LBI. Both the Shannon diversity of LBI values and the standard deviation of LBI values were negatively correlated with effective Rt and epidemic intensity and positively correlated with seasonal duration. The two measures were similarly correlated with effective Rt and epidemic intensity (Figure 3 - figure supplements 2 - 3), while the Shannon diversity of LBI had slightly stronger correlations with seasonal duration than s.d. LBI (Figure 5). Thus, both measures of LBI diversity appear to capture potentially biologically important heterogeneities in clade growth rates.
Separately, we use the inverse Shannon entropy of the incidence distribution to measure the spread of an A(H3N2) epidemic during the season, following the methods of Dalziel et al. 2018. The peak of an epidemic is a single time point at which the maximum incidence occurs. We have not encountered “the duration of the peak” before in epidemiology terminology, and, to our knowledge, there is not a robust way to measure the “duration of a peak,” unless one were to measure the time span between multiple points of maximum incidence or designate an arbitrary threshold for peak incidence that is not strictly the maximum incidence. Given that Shannon entropy is based on the normalized incidence distribution over the course of the entire influenza season (week 40 to week 20), it does not require designating an arbitrary threshold to describe epidemic intensity.
L642 - again why normalize epidemic intensities, and how sensitive are the results to this? I would imagine given that the RF results were unstable under leave-one-out analysis that some of those results could be quite sensitive to choices of normalization and scaling.
Epidemic intensity, defined as the inverse Shannon entropy of the incidence distribution, measures the spread of influenza cases across the weeks in a season. Following Dalziel et al. 2018, we estimated epidemic intensity from normalized incidence distributions rather than raw incidences so that epidemic intensity is invariant under differences in reporting rates and/or attack rates across regions and seasons. If we were to use raw incidences instead, HHS regions or seasons could have the appearance of greater or lower epidemic intensity (i.e., incidence concentrated within a few weeks or spread out over several weeks), due to differences in attack rates or test volume, rather than fundamental differences in the shapes of their epidemic curves. In other words, epidemic intensity is intended to measure the shape and spread of an epidemic, regardless of the actual volume of cases in a given region or season.
In the methods section, we provide further clarification for why epidemic intensities are based on normalized incidence distributions rather than raw incidences.
Lines 206-209: “Epidemic intensity is intended to measure the shape and spread of an epidemic, regardless of the actual volume of cases in a given region or season. Following the methodology of Dalziel et al. 2018, epidemic intensity values were normalized to fall between 0 and 1 so that epidemic intensity is invariant to differences in reporting rates and/or attack rates across regions and seasons.”
L643 - more information about what goes into Epidemia (variables, priors) such that it's replicable/understandable without the code would be good.
We now include additional information concerning the epidemic models used to estimate Rt, including all model equations, variables, and priors (Lines 210-276 in Methods).
L667 did you do breakpoint detection? Why linear models? Was log(incidence) used?
In our original submission, we estimated epidemic onsets using piecewise regression models (Lines 666674 in original manuscript), which model non-linear relationships with breakpoints by iteratively fitting linear models (Muggeo, 2003). Piecewise regression falls under the umbrella of parametric methods for breakpoint detection.
We did not include results from linear models fit to log(incidence) or GLMs with Gaussian error distributions and log links, due to two reasons. First, models fit to log-transformed data require non-zero values as inputs. Although breakpoint detection does not necessarily require weeks of zero incidence leading up to the start of an outbreak, limiting the time period for breakpoint detection to weeks with nonzero incidence (so that we could use log transformed incidence) substantially pushed back previous more biologically plausible estimates of epidemic onset weeks. Second, as an alternative to limiting the dataset to weeks with non-zero incidence, we tried adding a small positive number to weekly incidences so that we could fit models to log transformed incidence for the whole time period spanning epidemic week 40 (the start of the influenza season) to the first week of maximum incidence. Fitting models to log
transformed incidences produced unrealistic breakpoint locations, potentially because log transformations 1) linearize data, and 2) stabilize variance by reducing the impact of extreme values. Due to the short time span used for breakpoint detection, log transforming incidence diminishes abrupt changes in incidence at the beginning of outbreaks, making it difficult for models to estimate biologically plausible breakpoint locations. Log transformations of incidence may be more useful when analyzing time series spanning multiple seasons, rather than short time spans with sharp changes in incidence (i.e., the exponential growth phase of a single flu outbreak).
As an alternative to piecewise regression, our revised manuscript also estimates epidemic onsets using a Bayesian ensemble algorithm that accounts for the time series nature of incidence data and allows for complex, non-linear trajectories interspersed with change points (BEAST - a Bayesian estimator of Abrupt change, Seasonal change, and Trend; Zhao et al., 2019). Although a few regional onset time times differed across the two methods, our conclusions did not change concerning correlations between viral fitness and epidemic onset timing.
We have rewritten the methods section for estimating epidemic onsets to clarify our methodology and to include the BEAST method (Lines 292-308):
“We estimated the regional onsets of A(H3N2) virus epidemics by detecting breakpoints in A(H3N2) incidence curves at the beginning of each season. The timing of the breakpoint in incidence represents epidemic establishment (i.e., sustained transmission) rather than the timing of influenza introduction or arrival (Charu et al., 2017). We used two methods to estimate epidemic onsets: 1) piecewise regression, which models non-linear relationships with break points by iteratively fitting linear models to each segment (segmented R package) (Muggeo, 2008; Muggeo, 2003), and 2) a Bayesian ensemble algorithm (BEAST – a Bayesian estimator of Abrupt change, Seasonal change, and Trend) that explicitly accounts for the time series nature of incidence data and allows for complex, non-linear trajectories interspersed with change points (Rbeast R package) (Zhao et al., 2019). For each region in each season, we limited the time period of breakpoint detection to epidemic week 40 to the first week of maximum incidence and did not estimate epidemic onsets for regions with insufficient signal, which we defined as fewer than three weeks of consecutive incidence and/or greater than 30% of weeks with missing data. We successfully estimated A(H3N2) onset timing for most seasons, except for three A(H1N1) dominant seasons: 20002001 (0 regions), 2002-2003 (3 regions), and 2009-2010 (0 regions). Estimates of epidemic onset weeks were similar when using piecewise regression versus the BEAST method, and downstream analyses of correlations between viral fitness indicators and onset timing produced equivalent results. We therefore report results from onsets estimated via piecewise regression.”
L773 national indicators -- presumably this is because you don't have regional-level information, but it might be worth saying that earlier so it doesn't read like there are other indicators now, called national indicators, that we should have heard of
In the revised manuscript, we move a paragraph that was at the beginning of the Results to the beginning of the Methods.
Lines 123-132:
“Our study focuses on the impact of A(H3N2) virus evolution on seasonal epidemics from seasons 19971998 to 2018-2019 in the U.S.; whenever possible, we make use of regionally disaggregated indicators and analyses. We start by identifying multiple indicators of influenza evolution each season based on changes in HA and NA. Next, we compile influenza virus subtype-specific incidence time series for U.S. Department of Health and Human Service (HHS) regions and estimate multiple indicators characterizing influenza A(H3N2) epidemic dynamics each season, including epidemic burden, severity, type/subtype dominance, timing, and the age distribution of cases. We then assess univariate relationships between national indicators of evolution and regional epidemic characteristics. Lastly, we use multivariable regression models and random forest models to measure the relative importance of viral evolution, heterosubtypic interference, and prior immunity in predicting regional A(H3N2) epidemic dynamics.”
In Lines 484-487 in the Methods, we now mention that measures of seasonal antigenic and genetic distance are at the national level.
“For each replicate dataset, we estimated national-level genetic and antigenic distances between influenza viruses circulating in consecutive seasons by calculating the mean distance between viruses circulating in the current season 𝑡 and viruses circulating during the prior season (𝑡 – 1 year; one season lag) or two prior seasons ago (𝑡 – 2 years; two season lag).”
L782 Why Beta regression and what is "the resampled dataset" ?
Beta regression is appropriate for models of subtype dominance, epidemic intensity, and age-specific proportions of ILI cases because these data are continuous and restricted to the interval (0, 1) (Ferrari & Cribari-Neto, 2004). “The resampled dataset” refers to the “1000 bootstrap replicates of the original dataset (1000 samples with replacement)” mentioned in Lines 777-778 of the original manuscript.
In the revised manuscript, we include more background information about Beta regression models, and explicitly mention that regression models were fit to 1000 bootstrap replicates of the original dataset.
Lines 503-507:
“For subtype dominance, epidemic intensity, and age-specific proportions of ILI cases, we fit Beta regression models with logit links. Beta regression models are appropriate when the variable of interest is continuous and restricted to the interval (0, 1) (Ferrari & Cribari-Neto, 2004). For each epidemic metric, we fit the best-performing regression model to 1000 bootstrap replicates of the original dataset.”
The github is clear, comprehensive and well-documented, at least at a brief glance.
Thank you! At the time of resubmission, our GitHub repository is updated to incorporate feedback from the reviewers.
References
Altmann, A., Tolosi, L., Sander, O., & Lengauer, T. (2010). Permutation importance: a corrected feature importance measure. Bioinformatics, 26(10), 1340-1347.
https://doi.org/10.1093/bioinformatics/btq134
Barrat-Charlaix, P., Huddleston, J., Bedford, T., & Neher, R. A. (2021). Limited Predictability of Amino Acid Substitutions in Seasonal Influenza Viruses. Mol Biol Evol, 38(7), 2767-2777.
https://doi.org/10.1093/molbev/msab065
Bedford, T., Riley, S., Barr, I. G., Broor, S., Chadha, M., Cox, N. J., Daniels, R. S., Gunasekaran, C. P.,
Hurt, A. C., Kelso, A., Klimov, A., Lewis, N. S., Li, X., McCauley, J. W., Odagiri, T., Potdar, V., Rambaut, A., Shu, Y., Skepner, E., . . . Russell, C. A. (2015). Global circulation patterns of seasonal influenza viruses vary with antigenic drift. Nature, 523(7559), 217-220.
https://doi.org/10.1038/nature14460
Chao, A., Gotelli, N. J., Hsieh, T. C., Sander, E. L., Ma, K. H., Colwell, R. K., & Ellison, A. M. (2014). Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species diversity studies. Ecological Monographs, 84(1), 45-67. https://doi.org/10.1890/13-0133.1 Charu, V., Zeger, S., Gog, J., Bjornstad, O. N., Kissler, S., Simonsen, L., Grenfell, B. T., & Viboud, C. (2017). Human mobility and the spatial transmission of influenza in the United States. PLoS
Comput Biol, 13(2), e1005382. https://doi.org/10.1371/journal.pcbi.1005382
Dalziel, B. D., Kissler, S., Gog, J. R., Viboud, C., Bjornstad, O. N., Metcalf, C. J. E., & Grenfell, B. T.
(2018). Urbanization and humidity shape the intensity of influenza epidemics in U.S. cities.
Science, 362(6410), 75-79. https://doi.org/10.1126/science.aat6030
Debeer, D., & Strobl, C. (2020). Conditional permutation importance revisited. BMC Bioinformatics, 21(1), 307. https://doi.org/10.1186/s12859-020-03622-2
Dhanasekaran, V., Sullivan, S., Edwards, K. M., Xie, R., Khvorov, A., Valkenburg, S. A., Cowling, B. J., & Barr, I. G. (2022). Human seasonal influenza under COVID-19 and the potential consequences of influenza lineage elimination. Nat Commun, 13(1), 1721. https://doi.org/10.1038/s41467-02229402-5
Ferrari, S., & Cribari-Neto, F. (2004). Beta Regression for Modelling Rates and Proportions. Journal of Applied Statistics, 31(7), 799-815. https://doi.org/10.1080/0266476042000214501
Garten, R. J., Davis, C. T., Russell, C. A., Shu, B., Lindstrom, S., Balish, A., Sessions, W. M., Xu, X., Skepner, E., Deyde, V., Okomo-Adhiambo, M., Gubareva, L., Barnes, J., Smith, C. B., Emery, S. L., Hillman, M. J., Rivailler, P., Smagala, J., de Graaf, M., . . . Cox, N. J. (2009). Antigenic and genetic characteristics of swine-origin 2009 A(H1N1) influenza viruses circulating in humans.
Science, 325(5937), 197-201. https://doi.org/10.1126/science.1176225
Grebe, K. M., Yewdell, J. W., & Bennink, J. R. (2008). Heterosubtypic immunity to influenza A virus:
where do we stand? Microbes Infect, 10(9), 1024-1029.
https://doi.org/10.1016/j.micinf.2008.07.002
Hill, M. O. (1973). Diversity and Evenness: A Unifying Notation and Its Consequences. Ecology, 54(2), 427-432. https://doi.org/https://doi.org/10.2307/1934352
Huddleston, J., Barnes, J. R., Rowe, T., Xu, X., Kondor, R., Wentworth, D. E., Whittaker, L., Ermetal, B., Daniels, R. S., McCauley, J. W., Fujisaki, S., Nakamura, K., Kishida, N., Watanabe, S., Hasegawa, H., Barr, I., Subbarao, K., Barrat-Charlaix, P., Neher, R. A., & Bedford, T. (2020).
Integrating genotypes and phenotypes improves long-term forecasts of seasonal influenza
A/H3N2 evolution. Elife, 9, e60067. https://doi.org/10.7554/eLife.60067 Kuhn, M., & Johnson, K. (2013). Applied predictive modeling (Vol. 26). Springer.
Kuhn, M., & Johnson, K. (2019). Feature engineering and selection: A practical approach for predictive models. Chapman and Hall/CRC.
Lee, E. C., Arab, A., Goldlust, S. M., Viboud, C., Grenfell, B. T., & Bansal, S. (2018). Deploying digital health data to optimize influenza surveillance at national and local scales. PLoS Comput Biol,
14(3), e1006020. https://doi.org/10.1371/journal.pcbi.1006020
Lemey, P., Rambaut, A., Bedford, T., Faria, N., Bielejec, F., Baele, G., Russell, C. A., Smith, D. J., Pybus,
O. G., Brockmann, D., & Suchard, M. A. (2014). Unifying viral genetics and human transportation
data to predict the global transmission dynamics of human influenza H3N2. PLoS Pathog, 10(2), e1003932. https://doi.org/10.1371/journal.ppat.1003932
Muggeo, V. (2008). Segmented: An R Package to Fit Regression Models With Broken-Line Relationships. R News, 8, 20-25.
Muggeo, V. M. (2003). Estimating regression models with unknown break-points. Stat Med, 22(19), 30553071. https://doi.org/10.1002/sim.1545
Neher, R. A., Russell, C. A., & Shraiman, B. I. (2014). Predicting evolution from the shape of genealogical trees. Elife, 3, e03568. https://doi.org/10.7554/eLife.03568
Rambaut, A., Pybus, O. G., Nelson, M. I., Viboud, C., Taubenberger, J. K., & Holmes, E. C. (2008). The genomic and epidemiological dynamics of human influenza A virus. Nature, 453(7195), 615-619.
https://doi.org/10.1038/nature06945
Recker, M., Pybus, O. G., Nee, S., & Gupta, S. (2007). The generation of influenza outbreaks by a network of host immune responses against a limited set of antigenic types. Proceedings of the National Academy of Sciences, 104(18), 7711-7716.
https://doi.org/doi:10.1073/pnas.0702154104
Shannon, C. E. (1948). A mathematical theory of communication. The Bell system technical journal, 27(3), 379-423.
Smith, G. J., Vijaykrishna, D., Bahl, J., Lycett, S. J., Worobey, M., Pybus, O. G., Ma, S. K., Cheung, C. L., Raghwani, J., Bhatt, S., Peiris, J. S., Guan, Y., & Rambaut, A. (2009). Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature, 459(7250), 1122-1125. https://doi.org/10.1038/nature08182
Sridhar, S. (2016). Heterosubtypic T-Cell Immunity to Influenza in Humans: Challenges for Universal TCell Influenza Vaccines. Front Immunol, 7, 195. https://doi.org/10.3389/fimmu.2016.00195
Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9, 307. https://doi.org/10.1186/1471-2105-9-307
Strobl, C., Boulesteix, A. L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics, 8, 25.
https://doi.org/10.1186/1471-2105-8-25
Terajima, M., Babon, J. A., Co, M. D., & Ennis, F. A. (2013). Cross-reactive human B cell and T cell epitopes between influenza A and B viruses. Virol J, 10, 244. https://doi.org/10.1186/1743-422x10-244
Webster, R. G., Bean, W. J., Gorman, O. T., Chambers, T. M., & Kawaoka, Y. (1992). Evolution and ecology of influenza A viruses. Microbiological Reviews, 56(1), 152-179.
https://doi.org/doi:10.1128/mr.56.1.152-179.1992
Wen, F., Bedford, T., & Cobey, S. (2016). Explaining the geographical origins of seasonal influenza A
(H3N2). Proc Biol Sci, 283(1838). https://doi.org/10.1098/rspb.2016.1312
Yan, L., Neher, R. A., & Shraiman, B. I. (2019). Phylodynamic theory of persistence, extinction and speciation of rapidly adapting pathogens. Elife, 8. https://doi.org/10.7554/eLife.44205
Zhao, K., Wulder, M. A., Hu, T., Bright, R., Wu, Q., Qin, H., Li, Y., Toman, E., Mallick, B., Zhang, X., & Brown, M. (2019). Detecting change-point, trend, and seasonality in satellite time series data to track abrupt changes and nonlinear dynamics: A Bayesian ensemble algorithm. Remote Sensing
of Environment, 232, 111181. https://doi.org/10.1016/j.rse.2019.04.034
Zinder, D., Bedford, T., Gupta, S., & Pascual, M. (2013). The Roles of Competition and Mutation in Shaping Antigenic and Genetic Diversity in Influenza. PLOS Pathogens, 9(1).
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This work by Shin et al. demonstrated that a different form of PTH (R25C PTH) generated a comparable anabolic signal to rhPTH 1-34 using a large animal model. This valuable finding may have therapeutic potential in promoting bone formation or the healing process, and the methods seem solid, although there remains a concern regarding the small sample size and surgical procedure.
-
Reviewer #1 (Public Review):
Summary:
This study, titled "Enhancing Bone Regeneration and Osseointegration using rhPTH(1-34) and Dimeric R25CPTH(1-34) in an Osteoporotic Beagle Model," provides valuable insights into the therapeutic effects of two parathyroid hormone (PTH) analogs on bone regeneration and osseointegration. The research is methodologically sound, employing a robust animal model and a comprehensive array of analytical techniques, including micro-CT, histological/histomorphometric analyses, and serum biochemical analysis.
Strengths:
The use of a large animal model, which closely mimics postmenopausal osteoporosis in humans, enhances the study's relevance to clinical applications. The study is well-structured, with clear objectives, detailed methods, and a logical flow from introduction to conclusion. The findings are significant, demonstrating the potential of rhPTH(1-34) and dimeric R25CPTH(1-34) in enhancing bone regeneration, particularly in the context of osteoporosis.
Weaknesses: There are no major weaknesses.
-
Reviewer #2 (Public Review):
Summary:
This article explores the regenerative effects of recombinant PTH analogues on osteogenesis.
Strengths:
Although PTH has known to induce the activity of osteoclasts, accelerating bone resorption, paradoxically its intermittent use has become a common treat for osteoporosis. Previous studies successfully demonstrated this phenomenon in vivo, but most of them used rodent animal models, inevitably having a limitation. In this article, the authors tried to address this, using a beagle model, and assessed the osseointegrative effect of recombinant PTH analogues. As a result, the authors clearly observed the regenerative effects of PTH analogues, and compared the efficacy, using histologic, biochemical, and radiologic measurement for surgical-endocrinal combined large animal models. The data seem to be solid, and has potential clinical implications.
Weaknesses:
All the issues that I raised have been resolved in the revision process.
Overall, this paper is well-written and has clarity and consistency for a broader readership.
-
Reviewer #3 (Public Review):
Summary:
The work submitted by Dr. Jeong-Oh Shin and co-workers aims to investigate the therapeutic efficacy of rhPTH(1-34) and R25CPTH(1-34) on bone regeneration and osseointegration of titanium implants using a postmenopausal osteoporosis animal model.
In my opinion the findings presented are not strongly supported by the provided data since the methods utilized do not allow to significantly support the primary claims.
Strengths:
Strengths include certain good technologies utilized to perform histological sections (i.e. the EXAKT system).
Weaknesses:
Certain weaknesses significantly lower the enthusiasm for this work. Most important: the limited number of samples/group. In fact, as presented, the work has an n=4 for each treatment group. This limited number of samples/group significantly impairs the statistical power of the study. In addition, the implants were surgically inserted following a "conventional implant surgery", implying that no precise/guided insertion was utilized. This weakness is, in my opinion, particularly significant since the amount of bone osteointegration may greatly depend on the bucco-lingual positioning of each implant at the time of the surgical insertion (which should, therefore, be precisely standardized across all animals and for all surgical procedures).
Comments on current version:
As mentioned in my first review, this work is significantly underpowered for the following reasons: 1) n=4 for each treatment group.; 2) no randomization of the surgical sites receiving treatments; 3) implants surgically inserted without precision/guided surgery. The authors have not addressed these concerns.
On a minor note: not sure why the authors present a methodology to evaluate the dynamic bone formation (line 272) but do not present results (i.e. by means of histomorphometrical analyses) utilizing this methodology.
-
Author response:
The following is the authors’ response to the original reviews.
Response to Reviewer 1
(Cys25)PTH(1-84) does not show efficacy surpassing that of the previously used rhPTH(1-34). This needs to be discussed biologically and clinically.
Thank you very much for your valuable comments for enhancing the manuscript. We appreciate your input and have noted that this aspect was not addressed in the discussion. The authors have included the following paragraph in discussion section.
“This biological difference is thought to be due to dimeric R25CPTH(1-34) exhibiting a more preferential binding affinity for the RG versus R0 PTH1R conformation, despite having a diminished affinity for either conformation. Additionally, the potency of cAMP production in cells was lower for dimeric R25CPTH compared to monomeric R25CPTH, consistent with its lower PTH1R-binding affinity. (Noh et al., 2024) One of the potential clinical advantages of dimeric R25CPTH(1-34) is its partial agonistic effect in pharmacodynamics. This property may allow for a more fine-tuned regulation of bone metabolism, potentially reducing the risk of adverse effects associated with full agonism, such as hypercalcemia and bone resorption by osteolcast activity. Moreover, the dimeric form may offer a more sustained anabolic response, which could be beneficial in the context of long-term treatment strategies. (Noh et al., 2024) Also, the effects of dimer were prominent, as we mentioned better bone formation than the control group.” (2nd paragraph, Discussion section)
The terms (Cys25)PTH(1-84) and Dimeric R25CPTH(1-34) are being used interchangeably and incorrectly. A unification of these terms is necessary.
We totally agree with the reviewer’s notion. R25CPTH(1-84) represents mutated human PTH, rhPTH(1-34) and dimeric R25CPTH(1-34) are synthesized PTH analogs. To clarified the terminology, we thus have changeed the terminology in the manuscript appear in red.
The figure legend is incorrect. Not all figures are described, and even though there are figures from A to I, only up to E is explained, or the content is different.
We apologize for our negligence. As suggested by a reviewer, we've fixed the figure legends throughout before the list of references in the manuscript as follows.
“Figure legends
Figure 1. Micro-CT analysis (A-D) Experimental design for the controlled delivery of rhPTH(1-34) and dimeric R25CPTH(1-34) in ovariectomized beagle model. Representative images for injection and placement of titanium implant. (E) Micro-CT analysis. bone mineral density (BMD), bone volume (TV; mm3), trabecular number (Tb.N; 1/mm), trabecular thickness (Tb. Th; um), trabecular separation (Tb.sp; ㎛). Error bars indicate standard deviation. Data are shown as mean ± s.d. *p<0.05, **p<0.01, ***p<0.001, n.s., not significant. P, posterior. R, right
Figure 2. (A-I) Histological analysis of the different groups stained in Goldner’s trichrome. The presence of bone is marked by the green color and soft tissue in red. Red arrows indicate the position with soft tissues without bone around the implant threads. The area of bone formed was the widest in the rhPTH(1-34)-treated group. In the dimeric R25CPTH(1-34)treated group, there is a greater amount of bone than vehicle-treated group. Green arrows represent the bone formed over the implant. blue dotted line, margin of bone and soft tissue; Scale bars: 1mm
Figure 3. Histological analysis using Masson trichrome staining results in the rhPTH(1-34) and dimeric R25CPTH(1-34)-treated group (A-L) Masson trichrome-stained sections of cancellous bone in the mandibular bone. The formed bone is marked by the color red. Collagen is stained blue. Black dotted box magnification region of trabecular bone in the mandible. Scale bars, A-C, G-I: 1mm; D-F, J-L: 200 ㎛
Figure 4. Immunohistochemical analysis using TRAP staining for bone remodeling activity (A-L) TRAP staining is used to evaluate bone remodeling by staining osteoclasts. Osteoclasts is presented by the purple color. Black dotted box magnification region of trabecular bone in the mandible. (M, N) The number of TRAP-positive cells in the mandible of the rhPTH(1-34) and dimeric R25CPTH(1-34)-treated beagles. Scale bars, A-C, G-I: 1mm; D-F, J-L: 200 ㎛. Error bars indicate standard deviation. Data are shown as mean ± s.d. *p<0.05, **p<0.01, n.s., not significant
Figure 5. Measurement of biochemical Marker Dynamics in serum. The serum levels of calcium, phosphorus, P1NP, and CTX across three time points (T0, T1, T2) following treatment with dimeric dimeric R25CPTH(1-34), rhPTH(1-34), or control. (A-B) Calcium and phosphorus levels exhibit an upward trend in response to both PTH treatments compared to control, suggesting enhanced bone mineralization. (C) P1NP levels, indicative of bone formation, remain relatively unchanged across time and treatments. (D) CTX levels, associated with bone resorption, show no significant differences between groups. Data points for the dimeric R25CPTH(1-34), rhPTH(1-34), and control are marked by squares, circles, and triangles, respectively, with error bars representing confidence intervals.
Supplementary Figure. Three-dimensional reconstructed image of the bone surrounding the implants. Three-dimensional reconstructed images of the peri-implant bone depicting the osseointegration after different therapeutic interventions. (A) Represents the bone response to recombinant human parathyroid hormone fragment (rhPTH 1-34) treatment, showing the most robust degree of bone formation around the implant in the three groups. (B) Shows the bone response to a modified PTH fragment (dimeric R25CPTH(1-34)), indicating a similar level of bone growth and integration as seen with rhPTH(1-34), although to a slightly lesser extent. (C) Serves as the control group, demonstrating the least amount of bone formation and osseointegration. The upper panel provides a top view of the bone-implant interface, while the lower panel offers a cross-sectional view highlighting the extent of bony ingrowth and integration with the implant surface.”
In Figure 5, although the descriptions of T0, T1, T2 are mentioned in the method section, it would be more clear if there was a timeline like in Figure 1.
Based on the reviewer’s advice, we have indicated the timing of T0, T1, and T2 in the materials & methods section describing the serum biochemical assay, and we have shown a timeline in figure 5.
In Figure 5, instead of having calcium, phosphorus, P1NP, CTX graphs all under Figure 5, it would be more convenient for referencing in the text to label them as Figure 5A, Figure 5B, Figure 5C, Figure 5D.
We totally understood the reviewer’s comment. As the reviewer’s suggested, we have corrected the labeling in the text for figure 5 as follows.
“The levels of calcium, phosphorus, CTX, and P1NP were analyzed over time using RM-ANOVA (Figure 5). There were no significant differences between the groups for calcium and phosphorus at time points T0 and T1 (Figure 5A). However, after the PTH analog was administered at T2 (Figure 5A), the levels were highest in the rhPTH(1-34) group, followed by the dimeric R25CPTH(1-34) group, and then, lowest in the control group, which was statistically significant (Figure 5B,C). (P < 0.05) The differences between the groups over time for CTX and P1NP were not statistically significant (Figure 5D, E).”
Significance should be indicated in the figure (no asterisk present).
As the reviewer’s comment, we put the asterisk in the figure 5.
Addition of Figures in Text:
Line 112: change from "figure 2" to "figure 1" / Line 115: mention "figure 1. E"
Line 120: refer to "figure 1. E" / Line 123: change from "figure 3" to "figure 2"
Line 128: refer to "figure 2.A-C" / Line 137: mention "figure 3"
Line 138: refer to "figure 3. A-L" / Line 143: mention "figure 3. A-L"
Line 144: refer to "figure 3. E,F,K,L" / Line 148: mention "figure 4"
Line 150: refer to "figure 4 M,N" / Line 152: mention "figure 4. M,N"
Line 155: refer to "figure 5" / Line 157: mention "figure 5"
Line 159: refer to "figure 5" / Line 171: mention "figure 1 E"
Line 175: refer to "figure 2 M, N"/ Line 194: mention "figure 3"
Above all, thank you for the reviewer’s notion. We corrected detailed figure labeling in text to red color.
Response to Reviewer 2
First, the authors should clarify why they compared the effects of rhPTH(1-34) and of dimeric R25C2 PTH(1-34)? In most of the parameters, rhPTH(1-34) seems to be superior to dimeric R25C2 PTH(1-34). Why did the authors insist that the anabolic effects of dimer were prominent? Even though implication of dimeric R25C2 PTH(1-34) was drawn from genetic mutation studies, the authors should describe more clearly in the discussion the potential clinical benefits of the dimeric R25C2 PTH(1-34) compared to rhPTH(1-34), especially if dimeric R25C2 PTH(1-34) has just partial agonistic effect in pharmacodynamics.
Thank you for your insightful comments and questions regarding our results between rhPTH(1-34) and dimeric R25CPTH(1-34). rhPTH(1-34) is a well-characterized therapy for osteoporosis. In this study, rhPTH(1-34) generally showed superior outcomes in most parameters tested, the dimeric R25CPTH(1-34) exhibited specific anabolic effects that are not as pronounced with rhPTH(1-34). We recognized R25CPTH(1-34) as a anabolic effector. One of the potential advantages of dimeric R25CPTH(1-34) is its partial agonistic effect in pharmacodynamics. This property may allow for a more fine-tuned regulation of bone metabolism, potentially reducing the risk of adverse effects associated with full agonism, such as hypercalcemia and bone resorption by osteolast activity. Moreover, the dimeric form may offer a more sustained anabolic response, which could be beneficial in the context of long-term treatment strategies. Also, based on our results, we notes that the effects of dimer were prominent, as we mentioned better bone formation than the control group. We appreciate your input and have noted that this aspect was not addressed in the discussion. As a result, we have included the following paragraph in discussion section.
“This biological difference is thought to be due to dimeric R25CPTH(1-34) exhibiting a more preferential binding affinity for the RG versus R0 PTH1R conformation, despite having a diminished affinity for either conformation. Additionally, the potency of cAMP production in cells was lower for dimeric R25CPTH compared to monomeric R25CPTH, consistent with its lower PTH1R-binding affinity. (Noh et al., 2024) One of the potential clinical advantages of dimeric R25CPTH(1-34) is its partial agonistic effect in pharmacodynamics. This property may allow for a more fine-tuned regulation of bone metabolism, potentially reducing the risk of adverse effects associated with full agonism, such as hypercalcemia and bone resorption by osteolcast activity. Moreover, the dimeric form may offer a more sustained anabolic response, which could be beneficial in the context of long-term treatment strategies. (Noh et al., 2024) Also, the effects of dimer were prominent, as we mentioned better bone formation than the control group.” (2nd paragraph, Discussion section)
Second, please describe the intermittent and continuous application of PTH analogues. Many of the readers may misunderstand that the authors' daily injection of PTHs were actually to mimic the clinical intermittent application or continuous one. Incorporation of the author's intention for experimental design would be more helpful for readers.
Thank you for your insightful comments regarding the need for clearer differentiation between intermittent and continuous applications of PTH analogs in this study. We appreciate your concern that the readers may not fully grasp whether our daily injection protocol was intended to mimic clinical intermittent or continuous PTH administration. To address this, we have revised the manuscript to explicitly clarify that the daily injections of rhPTH(1-34) and dimeric R25CPTH(1-34) were designed to simulate the intermittent dosing regimen commonly used in clinical practice. This regimen is known to maximize the anabolic effects on bone while minimizing potential catabolic actions associated with more frequent or continuous hormone exposure. We have added detailed explanations in the Introduction, Methods, and Discussion sections to help readers understand our experimental design and its relevance to clinical settings.
Introduction section
“Administration of prathyroid hormone (PTH) analogs can be categorized into two distinct protocols: intermittent and continuous. Intermittent rhPTH(1-34) therapy, typically characterized by daily injections, is clinically used to enhance bone formation and strength. This method leverages the anabolic effects of rhPTH(1-34) without significant bone resorption, which can occur with more frequent or continuous exposure. On the other hand, continuous rhPTH(1-34) exposure, often modeled in research as constant infusion, tends to accelerate bone resorption activities, potentially leading to bone loss (Silva and Bilezikian, 2015; Jilka, 2007). Understanding these differences is crucial for interpreting the therapeutic implications of rhPTH(1-34) in bone health.”
Silva, B. C., & Bilezikian, J. P. (2015). Parathyroid hormone: anabolic and catabolic actions on the skeleton. Current Opinion in Pharmacology, 22, 41-50.
Jilka, R. L. (2007). Molecular and cellular mechanisms of the anabolic effect of intermittent PTH. Bone, 40(6), 1434-1446.
Materials and Methods section
“Each animal received one injection per day, aimed at replicating the intermittent rhPTH(1-34) exposure proven beneficial for bone regeneration and overall skeletal health in clinical settings (Neer et al., 2001; Kendler et al., 2018). This regimen was chosen to investigate the potential anabolic effects of these specific PTH analogs under conditions closely resembling therapeutic use.”
Neer, R. M., Arnaud, C. D., Zanchetta, J. R., Prince, R., Gaich, G. A., Reginster, J. Y., Hodsman, A. B., Eriksen, E. F., Ish-Shalom, S., Genant, H. K., Wang, O., and Mitlak, B. H. (2001). Effect of Parathyroid Hormone (1-34) on Fractures and Bone Mineral Density in Postmenopausal Women with Osteoporosis. The New England Journal of Medicine, 344(19), 1434-1441.
Kendler, D. L., Marin, F., Zerbini, C. A. F., Russo, L. A., Greenspan, S. L., Zikan, V., Bagur, A., Malouf-Sierra, J., Lakatos, P., Fahrleitner-Pammer, A., Lespessailles, E., Minisola, S., Body, J. J., Geusens, P., Moricke, R., & Lopez-Romero, P. (2018). Effects of Teriparatide and Risedronate on New Fractures in Post-Menopausal Women with Severe Osteoporosis (VERO): A Multicenter, Double-Blind, Double-Dummy, Randomized Controlled Trial. The Lancet, 391(10117), 230-240.
Discussion section
“The use of daily injections in this study was intended to simulate intermittent PTH therapy, a well-established clinical approach for managing osteoporosis and enhancing bone regeneration. Intermittent administration of PTH, as opposed to continuous exposure, is critical for maximizing the anabolic response while minimizing the catabolic effects that are associated with higher frequency or continuous hormone levels. Our findings support the notion that even with daily administration, both rhPTH(1-34) and dimeric dimeric R25CPTH(1-34) promote bone formation and osseointegration, consistent with the outcomes expected from intermittent therapy. It’s important for future research to consider the dosage and timing of administration to further optimize the therapeutic benefits of PTH analogs (Dempster et al., 2001; Hodsman et al., 2005).”
Dempster, D. W., Cosman, F., Kurland, E. S., Zhou, H., Nieves, J., Woelfert, L., Shane, E., Plavetic, K., Müller, R., Bilezikian, J., & Lindsay, R. (2001). Effects of Daily Treatment with Parathyroid Hormone on Bone Microarchitecture and Turnover in Patients with Osteoporosis: A Paired Biopsy Study. Journal of Bone and Mineral Research, 16(10), 1846-1853.
Hodsman, A. B., Bauer, D. C., Dempster, D. W., Dian, L., Hanley, D. A., Harris, S. T., Kendler, D. L., McClung, M. R., Miller, P. D., Olszynski, W. P., Orwoll, E., Yuen, C. K. (2005). Parathyroid Hormone and Teriparatide for the Treatment of Osteoporosis: A Review of the Evidence and Suggested Guidelines for Its Use. Endocrine Reviews, 26(5), 688-703.
Third, please unify the nomenclature. Ensure consistency in the nomenclature throughout the article. Unify the naming conventions for PTH analogues, such as rhPTH(1-34) vs teriparatide and (Cys25)PTH(1-84) vs R25CPTH(1-34) vs R25CPTH(1-34) vs (1-84). Choose one nomenclature for each analogue and use it consistently throughout the article.
We totally agree with the reviewer’s notion. R25CPTH(1-84) represents mutated human PTH, rhPTH(1-34) and dimeric R25CPTH(1-34) are synthesized PTH analogs. To clarified the terminology, we thus have changed the terminology in the manuscript appear in red.
Response to Reviewer 3
I would recommend to rewrite the manuscript in a form that is more understandable to the readers. In fact, it appears to me that this work was originally formatted in a way that would need the Materials and Methods to precede the results. As presented (and as requested by the eLife formatting) the Materials and Methods are available only at the end of the reading and, as a consequence, the readers needs to refer to the Materials and Methods to have a general and initial understanding of the study design (i.e. type of treatment for each group, etc are not well specified in the Results section).
Thank you for you constructive comments and suggestions regarding the manuscript. We appreciate your feedback on the organization of the manuscript entirely. As reviewer mentioned, Materials and methods were placed after the discussion section in accordance with the format of the elife journal. For a better and initial understanding, a description of each experimental group has been added to the Results section as follow. Thank you again for your valuable comments.
“To investigate evaluating and comparing the efficacy of rhPTH(1-34) and the dimeric R25CPTH(1-34) in promoting bone regeneration and healing in a clinically relevant animal model. In our study, beagle dogs were selected as the model due to their anatomical similarity to human oral structures, suitable size for surgeries, human-like bone turnover rates, and established oral health profiles, ensuring comparable and ethically sound research outcomes. The normal saline injected-control group, injected with 40ug/day PTH (Forsteo, Eli Lilly) group, and 40ug/day PTH analog-injected group. Animals in each group were injected subcutaneously for 10 weeks.”
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This important paper on measuring molecular connectivity using combined serotonin PET and resting-state fMRI provides both novel methods for studying the brain as well as insights into the effects of ecstasy administration. The methods are solid, with a few doubts that need to be dispelled surrounding the high anaesthetic dose used.
-
Reviewer #1 (Public Review):
This paper by Ionescu et al. applies novel brain connectivity measures based on fMRI and serotonin PET both at baseline and following ecstasy use in rats. There are multiple strengths to this manuscript. First, the use of connectivity measures using temporal correlations of 11C-DASB PET, especially when combined with resting state fMRI, is highly novel and powerful. The effects of ecstasy on molecular connectivity of the serotonin network and salience network are also quite intriguing.
I would like the authors to discuss and justify their use of high-dose (1.3%) isolfurane. A recent consensus paper on rat fMRI (Grandjean et al., "A Consensus Protocol for Functional Connectivity Analysis in the Rat Brain.") found that medetomidine combined with low dose isoflurane provided optimal control of physiology and fMRI signal. To overcome any doubts about the effects of the high-dose anaesthetic I'd encourage the authors to show the results of their functional connectivity specificity using the same or similar image processing protocol as described in that consensus paper. This is especially true since the fMRI ICs in Figure 2A appear fairly restricted.
I'd also be interested to read more about why the cerebellum was chosen as a reference region, given that serotonin is highly expressed in the cerebellum, and what effects the choice of reference region has on their quantification.
The PET ICs appear less bilateral than the fMRI ICs. Is that simply a thresholding artefact or is it a real signal?
"The data will be made available upon reasonable request" is not sufficient - please deposit the data in an open repository and link to its location.
-
Reviewer #2 (Public Review):
Summary:
The article aims to describe a novel methodology for the study of brain organization, in comparison to fMRI functional connectivity, under rest vs. controlled pharmacological stimulation.
Strengths:
Solid study design with pharmacological stimulation applied to assess the biological significance of functional and (novel) molecular connectivity estimates.
Provides relevant information on the multivariate organization of serotoninergic system in the brain.
Provides relevant information on the sensitivity of traditional (univariate PET analysis, fMRI functional connectivity) and novel (molecular connectivity) methods in measuring pharmacological effects on brain function.
Weaknesses:
While the study protocol is referenced in the paper, it would be useful to at least report whether the study uses bolus, constant infusion, or a combination of the two and the duration of the frames chosen for reconstruction. Minimal details on anesthesia should also be reported, clarifying whether an interaction between the pharmacological agent for anesthesia and MDMA can be expected (whole-brain or in specific regions).
Some terminology is used in a bit unclear way. E.g. "seed-based" usually refers to seed-to-voxel and not ROI-to-ROI analysis, or e.g. it is a bit confusing to have IC1 called SERT network when in fact all ICs derived from DASB data are SERT networks. Perhaps a different wording could be used (IC1 = SERT xxxxx network; IC2= SERT salience network) .
The limited sample size for the rats undergoing pharmacological stimulation which might make the study (potentially) not particularly powerful. This could not be a problem if the MDMA effect observed is particularly consistent across rats. Information on inter-individual variability of FC, MC, and BPND could be provided in this regard.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This valuable paper investigates how fish avoid thermal disturbances that occur on fast timescales. The authors use a creative experimental approach that quickly creates a vertical thermal interface, which they combine with careful behavioral analyses. The evidence supporting their results is solid, but there is a potential confounding factor between temperature and vertical positioning, and characterization of the thermal interface would greatly assist in interpreting the results.
-
Reviewer #1 (Public Review):
Summary:
The experiment is interesting and well executed and describes in high detail fish behaviour in thermally stratified waters. The evidence is strong but the experimental design cannot distinguish between temperature and vertical position of the treatments.
Strengths:
High statistical power, solid quantification of behaviour.
Weaknesses:
A major issue with the experimental design is the vertical component of the experiment. Many thermal preference and avoidance experiments are run using horizontal division in shuttlebox systems or in annular choice flumes. These remove the vertical stratification component so that hot and cold can be compared equally, without the vertical layering as a confounding factor. The method chosen, with its vertical stratification, is inherently unable to control for this effect because warm water is always above, and cold water is always below. This complicates the interpretations and makes firm conclusions about thermal behaviour difficult.
-
Reviewer #2 (Public Review):
This paper investigates an interesting question: how do fish react to and avoid thermal disturbances from the optimum that occur on fast timescales? Previous work has identified potential strategies for warm avoidance in fish on short timescales while strategies for cold avoidance are far more elusive. The work combines a clever experimental paradigm with careful analysis to show that trout parr avoid cold water by limiting excursions across a warm-cold thermal interface. While I found the paper interesting and convincing overall, there are a few omissions and choices in the presentation that limit interpretability and clarity.
A main question concerns the thermal interface itself. The authors track this interface using a blue dye that is mixed in with either colder or warmer water before a gate is opened that leads to gravitational flow overlaying the two water temperatures. The dye likely allows to identify convective currents which could lead to rapid mixing of water temperatures. However, it is less clear whether it accurately reflects thermal diffusion. This is problematic as the authors identify upward turning behavior around the interface which appears to be the behavioral strategy for avoiding cold water but not warm water. Without knowing the extent of the gradient across the interface, it is hard to know what the fish are sensing. The authors appear to treat the interface as essentially static, leading them to the conclusion that turning away before the interface is reached is likely related to associative learning. However, thermal diffusion could very likely create a gradient across centimeters which is used as a cue by the fish to initiate the turn. In an ideal world, the authors would use a thermal camera to track the relationship between temperature and the dye interface. Absent that, the simulation that is mentioned in passing in the methods section should be discussed in detail in the main text, and results should be displayed in Figure 1. Error metrics on the parameters used in the simulation could then be used to identify turns in subsequent figures that likely are or aren't affected by a gradient formed across the interface.
The authors assume that the thermal interface triggers the upward-turning behavior. However, an alternative explanation, which should be discussed, is that cold water increases the tendency for upward turns. This could be an adaptive strategy since for temperatures > 4C turning swimming upwards is likely a good strategy to reach warmer water.
The paper currently also suffers from a lack of clarity which is largely created by figure organization. Four main and 38 supplemental figures are very unusual. I give some specific recommendations below but the authors should decide which data is truly supplemental, versus supporting important points made in the paper itself. There also appear to be supplemental figures that are never referenced in the text which makes traversing the supplements unnecessarily tedious.<br /> The N that was used as the basis for statistical tests and plots should be identified in the figures to improve interpretability. To improve rigor, the experimental procedures should be expanded. Specifically, the paper uses two thermal models which are not detailed at all in the methods section.
-
Reviewer #3 (Public Review):
In this study, the authors measured the behavioural responses of brown trout to the sudden availability of a choice between thermal environments. The data clearly show that these fish avoid colder temperatures than the acclimation condition, but generally have no preference between the acclimation condition or warmer water (though I think the speculation that the fish are slowly warming up is interesting). Further, the evidence is compelling that avoidance of cold water is a combination of thermotaxis and thermokinesis. This is a clever experimental approach and the results are novel, interesting, and have clear biological implications as the authors discuss. I also commend the team for an extremely robust, transparent, and clear explanation of the experimental design and analytical decisions. The supplemental material is very helpful for understanding many of the methodological nuances, though I admit that I found it overwhelming at times and wonder if it could be pruned slightly to increase readability. Overall, I think the conclusions are generally well-supported by the data, and I have no major concerns.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This study describes the development and validation of an Automated Reproducible Mechano-stimulator (ARM), a valuable tool for standardizing and automating somatosensory behavior experiments. The data supporting the use of the ARM system are compelling, though the determination of whether that device emits any sounds, including in the ultrasonic range when in operation or when at rest, would add value to the study. Nevertheless, the ARM system is anticipated to be popular amongst somatosensory and pain researchers.
-
Reviewer #1 (Public Review):
Allodynia is commonly measured in the pain field using von Frey filaments, which are applied to a body region (usually hindpaw if studying rodents) by a human. While humans perceive themselves as being objective, as the authors noted, humans are far from consistent when applying these filaments. Not to mention, odors from humans, including those of different sexes, can influence animal behavior. There is thus a major unmet need for a way to automate this tedious von Frey testing process and to remove humans from the experiment. I have no major scientific concerns with the study, as the authors did an outstanding job of comparing this automated system to human experimenters in a rigorous and quantitative manner. They even demonstrated that their automated system can be used in conjunction with in vivo imaging techniques.
While it is somewhat unclear how easy and inexpensive this device will be, I anticipate everyone in the pain field will be clamoring to get their hands on a system like this. And given the mechanical nature of the device and the propensity for mice to urinate on things, I also wonder how frequently the device breaks/needs to be repaired. Perhaps some details regarding the cost and reliability of the device would be helpful to include, as these are the two things that could make researchers hesitant to adopt immediately.
The only major technical concern, which is easy to address, is whether the device generates ultrasonic sounds that rodents can hear when idle or operational, across the ultrasonic frequencies that are of biological relevance (20-110 kHz). These sounds are generally alarm vocalizations and can create stress in animals, and/or serve as cues of an impending stimulus (if indeed they are produced by the device).
-
Reviewer #2 (Public Review):
Summary:
Burdge, Juhmka, et al describe the development and validation of a new automated system for applying plantar stimuli in rodent somatosensory behavior tasks. This platform allows the users to run behavior experiments remotely, removing experimenter effects on animals and reducing variability in the manual application of stimuli. The system integrates well with other automated analysis programs that the lab has developed, providing a complete package for standardizing behavior data collection and analysis. The authors present extensive validations of the system against manual stimulus application. Some proof of concept studies also show how the system can be used to better understand the effect of experimenters on behavior and the effects of how stimuli are presented on the micro features of the animal withdrawal response.
Strengths:
If widely adopted, ARM has the potential to reduce variability in plantar behavior studies across and within labs and provide a means to standardize results. The system is well-validated and results clearly and convincingly presented. Most claims are well supported by experimental evidence.
Weaknesses:
ARM seems like a fantastic system that could be widely adopted, but no details are given on how a lab could build ARM, thus its usefulness is limited.
The ARM system appears to stop short of hitting the desired forces that von Frey filaments are calibrated toward (Figure 2). This may affect the interpretation of results.
The authors mention that ARM generates minimal noise; however, if those sounds are paired with stimulus presentation they could still prompt a withdrawal response. Including some 'catch' trials in an experiment could test for this.
The experimental design in Figure 2 is unclear- did each experimenter have their own cohort of 10 mice, or was a single cohort of mice shared? If shared, there's some concern about repeat testing.
-
Reviewer #3 (Public Review):
Summary:
This report describes the development and initial applications of the ARM (Automated Reproducible Mechano-stimulator), a programmable tool that delivers various mechanical stimuli to a select target (most frequently, a rodent hindpaw). Comparisons to traditional testing methods (e.g., experimenter application of stimuli) reveal that the ARM reduces variability in the anatomical targeting, height, velocity, and total time of stimulus application. Given that the ARM can be controlled remotely, this device was also used to assess the effect of the experimenter's presence on reflexive responses to mechanical stimulation. Lastly, the ARM was used to stimulate rodent hind paws while measuring neuronal activity in the basolateral nucleus of the amygdala (BLA), a brain region that is associated with the negative effect of pain. This device, and similar automated devices, will undoubtedly reduce experimenter-related variability in reflexive mechanical behavior tests; this may increase experimental reproducibility between laboratories.
Strengths:
Clear examples of variability in experimenter stimulus application are provided and then contrasted with uniform stimulus application that is inherent to the ARM.
Weaknesses:
Limited details are provided for statistical tests and inappropriate claims are cited for individual tests. For example, in Figure 2, differences between researchers at specific forces are reported to be supported by a 2-way ANOVA; these differences should be derived from a post-hoc test that was completed only if the independent variable effects (or interaction effect) were found to be significant in the 2-way ANOVA. In other instances, statistical test details are not provided at all (e.g., Figures 3B, 3C, Figure 4, Figure 6G).
One of the arguments for using the ARM is that it will minimize the effect that the experimenter's presence may have on animal behavior. In the current manuscript, the effects of the experimenter's presence on both habituation time and aspects of the withdrawal reflex are minimal for Researcher 2 and non-existent for Research 1. This is surprising given that Researcher 2 is female; the effect of experimenter presence was previously documented for male experiments as the authors appropriately point out (Sorge et al. PMID: 24776635). In general, this argument could be strengthened (or perhaps negated) if more than N=2 experiments were included in this assessment.
The in vivo BLA calcium imaging data feel out of place in this manuscript. Is the point of Figure 6 to illustrate how the ARM can be coupled to Inscopix (or other external inputs) software? If yes, the following should be addressed: why do the up-regulated and down-regulated cell activities start increasing/decreasing before the "event" (i.e., stimulus application) in Figure 6F? Why are the paw withdrawal latencies and paw distanced travelled values in Figures 6I and 6J respectively so much faster/shorter than those illustrated in Figure 5 where the same approach was used?
Another advance of this manuscript is the integration of a 500 fps camera (as opposed to a 2000 fps camera) in the PAWS platform. To convince readers that the use of this more accessible camera yields similar data, a comparison of the results for cotton swabs and pinprick should be completed between the 500 fps and 2000 fps cameras. In other words, repeat Supplementary Figure 3 with the 2000 fps camera and compare those results to the data currently illustrated in this figure.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This study makes a valuable contribution to understanding antiviral responses in fish by revealing a role for the cell cycle protein kinase CDK2 in type I interferon signaling. The evidence supporting the authors' claims is solid, including both in vivo and in vitro investigative approaches. However, the mechanisms underlying CDK2 activity are not completely established. This work will be of interest to cell biologists, immunologists, and virologists.
-
Reviewer #1 (Public Review):
Summary:
The authors set out to evaluate the regulation of interferon (IFN) gene expression in fish, using mainly zebrafish as a model system. Similar to more widely characterized mammalian systems, fish IFN is induced during viral infection through the action of the transcription factor IRF3 which is activated by phosphorylation by the kinase TBK1. It has been previously shown in many systems that TBK1 is subjected to both positive and negative regulation to control IFN production. In this work, the authors find that the cell cycle kinase CDK2 functions as a TBK1 inhibitor by decreasing its abundance through the recruitment of the ubiquitinylation ligase, Dtx4, which has been similarly implicated in the regulation of mammalian TBK1. Experimental data are presented showing that CDK2 interacts with both TBK1 and Dtx4, leading to TBK1 K48 ubiquitinylation on K567 and its subsequent degradation by the proteasome.
Strengths:
The strengths of this manuscript are its novel demonstration of the involvement of CDK2 in a process in fish that is controlled by different factors in other vertebrates and its clear and supportive experimental data.
Weaknesses:
The weaknesses of the study include the following. 1) It remains unclear whether the function described for CDK2 is regulatory, that is, it affects TBK1 levels during physiological responses such as viral infection or cell cycle progression, or if it is homeostatic, governing the basal abundance of TBK1 but not responding to signaling. 2) The authors have not explored whether the catalytic activity of CDK2 is required for TBK1 ubiquitinylation and, if so, what its target specificity is. 3) Given the multitude of CDK isoforms in fish, it remains unexplored whether the identified fish CDK2 homolog is a requisite cell cycle regulator or if its action in the cell cycle is redundant with other CDKs.
-
Reviewer #2 (Public Review):
Summary:
In this paper, the authors describe a novel function involving the cell cycle protein kinase CDK2, which binds to TBK1 (an essential component of the innate immune response) leading to its degradation in a ubiquitin/proteasome-dependent manner. Moreover, the E3 ubiquitin ligase, Dtx4, is implicated in the process by which CDK2 increases the K48-linked ubiquitination of TBK1. This paper presents intriguing findings on the function of CDK2 in lower vertebrates, particularly its regulation of IFN expression and antiviral immunity.
Strengths:
(1) The research employs a variety of experimental approaches to address a single question. The data are largely convincing and appear to be well executed.
(2) The evidence is strong and includes a combination of in vivo and in vitro experiments, including knockout models, protein interaction studies, and ubiquitination analyses.
(3) This study significantly impacts the field of immunology and virology, particularly concerning the antiviral mechanisms in lower vertebrates. The findings provide new insights into the regulation of IFN expression and the broader role of CDK2 in immune responses. The methods and data presented in this paper are highly valuable for the scientific community, offering new avenues for research into antiviral strategies and the development of therapeutic interventions targeting CDK2 and its associated pathways.
Weaknesses:
(1) While the study focuses on fish, the broader implications for other lower vertebrates and higher vertebrates are not extensively discussed.
(2) The study heavily relies on specific fish models, which may limit the generalizability of the findings across different species.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This study provides a useful advance in generating mouse oligodendrocytes by direct lineage conversion from cortical astrocytes. The authors demonstrate that Sox10 converts astrocytes to MBP+ oligodendrocytes, whereas Olig2 expression converts astrocytes to PDFRalpha+ oligodendrocyte progenitor cells. The data supporting the conclusions are solid, but there are concerns regarding select figures and the experiments being limited to in vitro studies.
-
Reviewer #1 (Public Review):
Faiz et al. investigate small molecule-driven direct lineage reprogramming of mouse postnatal mouse astrocytes to oligodendrocyte lineage cells (OLCs). They use a combination of in vitro, in vivo, and computational approaches to confirm lineage conversion and to examine the key underlying transcription factors and signaling pathways. Lentiviral delivery of transcription factors previously reported to be essential in OLC fate determination-Sox10, Olig2, and Nkx2.2-to astrocytes allows for lineage tracing. They found that these transcription factors are sufficient in reprogramming astrocytes to iOLCs, but that the OLCs range in maturity level depending on which factor they are transfected with. They followed up with scRNA-seq analysis of transfected and control cultures 14DPT, confirming that TF-induced astrocytes take on canonical OLC gene signatures. By performing astrocyte lineage fate mapping, they further confirmed that TF-induced astrocytes give rise to iOLCs. Finally, they examined the distinct genetic drivers of this fate conversion using scRNA-seq and deep learning models of Sox10- astrocytes at multiple time points throughout the reprogramming. These findings are certainly relevant to diseases characterized by the perturbation of OLC maturation and/or myelination, such as Multiple Sclerosis and Alzheimer's Disease. Their application of such a wide array of experimental approaches gives more weight to their findings and allows for the identification of additional genetic drivers of astrocyte to iOLC conversion that could be explored in future studies. Overall, I find this manuscript thoughtfully constructed and only have a few questions to be addressed.
(1) The authors suggest that Sox10- and Olig2- transduced astrocytes result in distinct subpopulations iOLCs. Considering it was discussed in the introduction that these TFs cyclically regulate one another throughout differentiation, could they speculate as to why such varying iOLCs resulted from the induction of these two TFs?
(2) In Figure 1B it appears that the Sox10- MBP+ tdTomato+ cells decreases from D12 to D14. Does this make sense considering MBP is a marker of more mature OLCs?
(3) Previous studies have shown that MBP expression and myelination in vitro occurs at the earliest around 4-6 weeks of culturing. When assessing whether further maturation would increase MBP positivity, authors only cultured cells up to 22 DPT and saw no significant increase. Has a lengthier culture timeline been attempted?
(4) Figure S4D is described as "examples of tdTomatonegzsGreen+OLCmarker+ cells that arose from a tdTomatoneg cell with an astrocyte morphology." The zsGreen+ tdTomato- cell is not convincingly of "astrocyte morphology"; it could be a bipolar OLC. To strengthen the conclusions and remove this subjectivity, more extensive characterizations of astrocyte versus OLC morphology in the introduction or results are warranted. This would make this observation more convincing since there is clearly an overlap in the characteristics of these cell types.
-
Reviewer #2 (Public Review):
The study by Bajohr investigates the important question of whether astrocytes can generate oligodendrocytes by direct lineage conversion (DLR). The authors ectopically express three transcription factors - Sox10, Olig2 and Nkx6.2 - in cultured postnatal mouse astrocytes and use a combination of Aldh1|1-astrocyte fate mapping and live cell imaging to demonstrate that Sox10 converts astrocytes to MBP+ oligodendrocytes, whereas Olig2 expression converts astrocytes to PDFRalpha+ oligodendrocyte progenitor cells. Nkx6.2 does not induce lineage conversion. The authors use single-cell RNAseq over 14 days post-transduction to uncover molecular signatures of newly generated iOLs.
The potential to convert astrocytes to oligodendrocytes has been previously analyzed and demonstrated. Despite the extensive molecular characterization of the direct astrocyte-oligodendrocyte lineage conversion, the paper by Bajohr et al. does not represent significant progress. The entire study is performed in cultured cells, and it is not demonstrated whether this lineage conversion can be induced in astrocytes in vivo, particularly at which developmental stage (postnatal, adult?) and in which brain region. The authors also state that generating oligodendrocytes from astrocytes could be relevant for oligodendrocyte regeneration and myelin repair, but they don't demonstrate that lineage conversion can be induced under pathological conditions, particularly after white matter demyelination. Specific issues are outlined below.
(1) The authors perform an extensive characterization of Sox10-mediated DLR by scRNAseq and demonstrate a clear trajectory of lineage conversion from astrocytes to terminally differentiated MBP+ iOLCs. A similar type of analysis should be performed after Olig2 transduction, to determine whether transcriptomics of OPC induction overlaps with any phase of MBP+ oligodendrocyte induction.
(2) A complete immunohistochemical characterization of the cultures should be performed at different time points after Sox10 and Olig2 transduction to confirm OL lineage cell phenotypes.
-
-
www.biorxiv.org www.biorxiv.org
-
Author response:
The following is the authors’ response to the original reviews.
eLife assessment
This work presents valuable information about the specificity and promiscuity of toxic effector and immunity protein pairs. The evidence supporting the claims of the authors is currently incomplete, as there is concern about the methodology used to analyze protein interactions, which did not take potential differences in expression levels, protein folding, and/or transient interaction into account. Other methods to measure the strength of interactions and structural predictions would improve the study. The work will be of interest to microbiologists and biochemists working with toxin-antitoxin and effector-immunity proteins.
We thank the reviewers for considering this manuscript. We agree that this manuscript provides a valuable and cross-discipline introduction to new EI pair protein families where we focus on the EI pair’s flexibility and impacts on community structure. As such, we believe we have provided a solid foundation for future studies to examine non-cognate interactions and their possible effects on microbial communities. This, by definition, leaves some areas “incomplete” and, therefore, open for further investigations. While the methods we show do consider potential differences in binding assays, we have more explicitly addressed how “expression, protein folding, and/or transient binding” may play into this expanded EI pair model. We have also tempered the discussion of the proposed model, while also clearly highlighting other published evidence of non-cognate binding interactions between effector and immunity proteins. We have responded to the reviewers’ public comments (italicized below).
In this revised manuscript, we have updated the main text, particularly the Discussion section, to include more careful language, explain past research better, and add new references to works showing non-cognate immunity proteins protecting against effectors in other systems. We have also updated the supplemental files with more analyses; the relevant procedures are in the Materials and Methods.
Public Reviews:
Note: Reviewer 1, who appeared to focus on a subset of the manuscript rather than the whole, based their comments on several inaccuracies, which we discuss below. We found the tone in this reviewer's comments to be, at times, inappropriate, e.g., using "harsh" and "simply too drastic" to imply that common structure-function analyses were outside of the field-standard methods. We also note that the reviewer took a somewhat atypical step in reviewing this manuscript by running and analyzing the potential protein-complex data in AlphaFold2 but did not discuss areas of low confidence within that model that may contradict their conclusions. We are concerned their approach muddled valid scientific criticisms with problematic conclusions.
Reviewer #1 (Public Review):
In this manuscript, Knecht, Sirias et al describe toxin-immunity pair from Proteus mirabilis. Their observations suggest that the immunity protein could protect against non-cognate effectors from the same family. They analyze these proteins by dissecting them into domains and constructing chimeras which leads them to the conclusion that the immunity can be promiscuous and that the binding of immunity is insufficient for protective activity.
Strengths:<br /> The manuscript is well written and the data are very well presented and could be potentially interesting. The phylogenetic analysis is well done, and provides some general insights.
Weaknesses:<br /> (1) Conclusions are mostly supported by harsh deletions and double hybrid assays. The later assays might show binding, but this method is not resolutive enough to report the binding strength. Proteins could still bind, but the binding might be weaker, transient, and out-competed by the target binding.
The phrasing of structure-function analyses as “harsh” is a bit unusual, as other research groups regularly use deletions and hybrid studies. Given the known caveats to deletion and domain substitutions, we included point-mutation analyses for both the effector and immunity proteins, as found on lines 105 - 113 and 255 - 261 in the current manuscript. These caveats are also why we coupled the in vitro binding analyses with in vivo protection experiments in two distinct experimental systems (E. coli and P. mirabilis). Based on this manuscript’s introductory analysis (where we define and characterize the genes, proteins, interactions, phylogenetics, and incidences in human microbiomes), the next apparent questions are beyond the scope of this study. Future approaches would include analyzing purified proteins from the effector (E) and immunity (I) protein families using biochemical assays, such as X-ray crystallography, circular dichroism spectroscopy, among others.
Interestingly, most papers in the EI field do not measure EI protein affinity (Jana et al., 2019, Yadav et al., 2021). Notable exceptions are earlier colicin research (Wallis et al., 1995) and a new T6SS EI paper (Bosch et al., 2023) published as we first submitted this manuscript.
(2) While the authors have modeled the structure of toxin and immunity, the toxin-immunity complex model is missing. Such a model allows alternative, more realistic interpretation of the presented data. Firstly, the immunity protein is predicted to bind contributing to the surface all over the sequence, except the last two alpha helices (very high confidence model, iPTM>0.8). The N terminus described by the authors contributes one of the toxin-binding surfaces, but this is not the sole binding site. Most importantly, other parts of the immunity protein are predicted to interact closer to the active site (D-E-K residues). Thus, based on the AlphaFold model, the predicted mechanism of immunization remains physically blocking the active site. However, removing the N terminal part, which contributes large interaction surface will directly impact the binding strength. Hence, the toxin-immunity co-folding model suggests that proper binding of immunity, contributed by different parts of the protein, is required to stabilize the toxin-immunity complex and to achieve complete neutralization. Alternative mechanisms of neutralization might not be necessary in this case and are difficult to imagine for a DNase.
In response to the reviewer’s comment, we again reviewed the RdnE-RdnI AlphaFold2 complex predictions with the most updated version of ColabFold (1.5.2-patch with PDB100 and MMseq2) and have included them at the end of these responses [1].
However, the literature reports that computational predictions of E-I complexes often do not match experimental structural results (Hespanhol et al., 2022, Bosch et al., 2023). As such, we chose not to include the predicted cognate and non-cognate RdnE-I complexes from ColabFold (which uses AlphaFold2) and have not included this data in the revised manuscript. (It is notable that reviewer 1 found the proposed expanded model and research so interesting as to directly input and examine the AI-predicted RdnE-RdnI protein interactions in AlphaFold2.)
Discussion of the prevailing toxin-immunity complex model is in the introduction (lines 45-48) and Figure 5E. Further, there are various known mechanisms for neutralizing nucleases and other T6SS effectors, which we briefly state in the discussion (lines 359 - 361). More in-depth, these molecular mechanisms include active-site blocking (Benz et al., 2012), allosteric-site binding (Kleanthous et al., 1999 and Lu et al., 2014), enzymatic neutralization of the target (Ting et al., 2021), and structural disruption of both the active and binding sites (Bosch et al., 2023). Given this diversity of mechanisms, we did not presume to speculate on the as-of-yet unknown mechanism of RdnI protection. We have expanded discussion of these items in the revised manuscript.
(3) Dissection of a toxin into two domains is also not justified from a structural point of view, it is probably based on initial sequence analyses. The N terminus (actually previously reported as Pone domain in ref 21) is actually not a separate domain, but an integral part of the protein that is encased from both sides by the C terminal part. These parts might indeed evolve faster since they are located further from the active site and the central core of the protein. I am happy to see that the chimeric toxins are active, but regarding the conservation and neutralization, I am not surprised, that the central core of the protein fold is highly conserved. However, "deletion 2" is quite irrelevant - it deletes the central core of the protein, which is simply too drastic to draw any conclusions from such a construct - it will not fold into anything similar to an original protein, if it will fold properly at all.
The reviewer’s comment highlights why we turned to the chimera proteins to dissect the regions of RdnE (formerly IdrD-CT), as the deletions could result in misfolded proteins. (We initially examined RdnE in the years before the launch of AlphaFold2.) However, the reviewer is incorrect regarding the N-terminus of RdnE. The PoNe domain, while also a subfamily of the PD-(D/E)XK superfamily, forms a distinct clade of effectors from the PD-(D/E)XK domain in RdnE (formally IdrD-CT) as seen in Hespanhol et al., 2022; this is true for other DNase effectors as well. Many studies analyzing effectors within the PD-(D/E)XK superfamily only focus on the PD-(D/E)XK domain, removing just this domain from the context of the whole protein (Hespanhol et al., 2022; Jana et al., 2019). Of note, in RdnE, this region alone (containing the DNA-binding domain) is insufficient for DNase activity (unlike in PoNe). We have clarified this distinction in the results section of the current manuscript, visible in figure 2 .
(4) Regarding the "promiscuity" there is always a limit to how similar proteins are, hence when cross-neutralization is claimed authors should always provide sequence similarities. This similarity could also be further compared in terms of the predicted interaction surface between toxin and immunity.
Reviewer 1 points out a fundamental property of protein-protein interactions that has been isolated away from the impacts of such interactions on bacterial community structure. We have provided the whole protein alignments in figure 3 supplemental figure 3, the summary images in Figure 3D, and the protein phylogenetic trees in Figure 3C. We encourage others to consider the protein alignments as percent amino acid sequence similarity is not necessarily a good gauge for protein function and interactions. These data are publicly available on the OSF website associated with this manuscript https://osf.io/scb7z/, and we hope the community explores the data there.
In consideration of the enthusiasm to deeply dive into the primary research data, we have included the pairwise sequence identities across the entire proteins here: Proteus RdnI vs. Rothia RdnI: 23.6%; Proteus RdnI vs. Prevotella RdnI: 16.3%, Proteus RdnI vs. Pseudomonas RdnI: 14.6%; Rothia RdnI vs. Prevotella RdnI: 22.4%, Rothia RdnI vs. Pseudomonas RdnI: 17.6%; Prevotella RdnI vs. Pseudomonas RdnI: 19.5%. (As stated in response to reviewer 1 comment 2, we did not find it appropriate to make inferences based on AlphaFold2-predicted protein complexes.)
Overall, it looks more like a regular toxin-immunity couple, where some cross-reactions with homologues are possible, depending on how far the sequences have deviated. Nevertheless, taking all of the above into account, these results do not challenge toxin-immunity specificity dogma.
In this manuscript, we did not intend to dismiss the E-I specificity model but rather point out its limitations and propose an important expansion of that model that accounts for cross-protection and survival against attacks from other genera. We agree that it is commonly considered that deviations in amino acid sequence over time could result in cross-binding and protection (see lines 364-368). However, the impacts of such cross-binding on community structure, bacterial survival, and strain evolution were rarely addressed in prior literature, with exceptions such as in Zhang et al., 2013 and Bosch et al., 2023 among others. One key insight we propose and show in this manuscript is that cross-binding can be a fitness benefit in mixed communities; therefore, it could be selected for evolutionarily (lines 378-380), even potentially in host microbiomes.
Reviewer #2 (Public Review):
Summary:
The manuscript by Knecht et al entitled "Non-cognate immunity proteins provide broader defenses against interbacterial effectors in microbial communities" aims at characterizing a new type VI secretion system (T6SS) effector immunity pair using genetic and biochemical studies primarily focused on Proteus mirabilis and metagenomic analysis of human-derived data focused on Rothia and Prevotella sequences. The authors provide evidence that RdnE and RdnI of Proteus constitute an E-I pair and that the effector likely degrades nucleic acids. Further, they provide evidence that expression of non-cognate immunity derived from diverse species can provide protection against RdnE intoxication. Overall, this general line of investigation is underdeveloped in the T6SS field and conceptually appropriate for a broad audience journal. The paper is well-written and, aside from a few cases, well-cited. As detailed below however, there are several aspects of this paper where the evidence provided is somewhat insufficient to support the claims. Further, there are now at least two examples in the literature of non-cognate immunity providing protection against intoxication, one of which is not cited here (Bosch et al PMID 37345922 - the other being Ting et al 2018). In general therefore I think that the motivating concept here in this paper of overturning the predominant model of interbacterial effector-immunity cognate interactions is oversold and should be dialed back.
We agree that analyses focusing on flexible non-cognate interactions and protection are underdeveloped within the T6SS field and are not fully explored within a community structure. These ideas are rapidly growing in the field, as evidenced by the references provided by the reviewer. As stated earlier, we did not intend to overturn the prevailing model but rather have proposed an expanded model that accounts for protection against attacks from foreign genera.
Strengths:
One of the major strengths of this paper is the combination of diverse techniques including competition assays, biochemistry, and metagenomics surveys. The metagenomic analysis in particular has great potential for understanding T6SS biology in natural communities. Finally, it is clear that much new biology remains to be discovered in the realm of T6SS effectors and immunity.
Weaknesses:
The authors have not formally shown that RdnE is delivered by the T6SS. Is it the case that there are not available genetics tools for gene deletion for the BB2000 strain? If there are genetic tools available, standard assays to demonstrate T6SS-dependency would be to interrogate function via inactivation of the T6SS (e.g. by deleting tssC).
Our research group showed that the T6SS secretes RdnE (previously IdrD) in Wenren et al., 2013 (cited in lines 71-73). We later confirmed T6SS-dependent secretion by LC-MS/MS (Saak et al., 2017).
For swarm cross-phyla competition assays (Figure 4), at what level compared to cognate immunity are the non-cognate immunity proteins being expressed? This is unclear from the methods and Figure 4 legend and should be elaborated upon. Presumably these non-cognate immunity proteins are being overexpressed. Expression level and effector-to-immunity protein stoichiometry likely matters for interpretation of function, both in vitro as well as in relevant settings in nature. It is important to assess if native expression levels of non-cognate cross-phyla immunity (e.g. Rothia and Prevotella) protect similarly as the endogenously produced cognate immunity. This experiment could be performed in several ways, for example by deleting the RdnE-I pair and complementing back the Rothia or Prevotella RdnI at the same chromosomal locus, then performing the swarm assay. Alternatively, if there are inducible expression systems available for Proteus, examination of protection under varying levels of immunity induction could be an alternate way to address this question. Western blot analysis comparing cognate to non-cognate immunity protein levels expressed in Proteus could also be important. If the authors were interested in deriving physical binding constants between E and various cognate and non-cognate I (e.g. through isothermal titration calorimetry) that would be a strong set of data to support the claims made. The co-IP data presented in supplemental Figure 6 are nice but are from E. coli cells overexpressing each protein and do not fully address the question of in vivo (in Proteus) native expression.
P. mirabilis strain ATCC29906 does not encode the rdnE and rdnI genes on the chromosome (NCBI BioSample: SAMN00001486) (line 151). Production of the RdnI proteins, including the cognate Proteus RdnI, comes from equivalent transgenic expression vectors. Specifically, the rdnI genes were expressed under the flaA promoter in P. mirabilis strain ATCC29906 (Table 1) for the swarm competition assays found in Figure 2C and Figure 4. This promoter results in constitutive expression in swarming cells (Belas et al., 1991; Jansen et al., 2003). In the revised manuscript, figure 4 Supplement Figure 2 shows the relative RdnI protein levels in these strains; we also clarified the expression constructs in the text (see reviewer 3, comment 1).
Lines 321-324, the authors infer differences between E and I in terms of read recruitment (greater abundance of I) to indicate the presence of orphan immunity genes in metagenomic samples (Figure 5A-D). It seems equally or perhaps more likely that there is substantial sequence divergence in E compared to the reference sequence. In fact, metagenomes analyzed were required only to have "half of the bases on reference E-I sequence receiving coverage". Variation in coverage again could reflect divergent sequence dipping below 90% identity cutoff. I recommend performing metagenomic assemblies on these samples to assess and curate the E-I sequences present in each sample and then recalculating coverage based on the exact inferred sequences from each sample.
This comment raises the challenges with metagenomic analyses. It was difficult to balance specificity to a particular species’ DNA sequence with the prevalence of any homologous sequence in the sample. Given the distinction in binding interactions among the examined four species, we opted to prioritize specificity, accepting that we were losing access to some rdnE and rdnI sequences in that decision. We chose a 90% identity cutoff, which, through several in silica controls, ensured that each sequence we identified was the rdnE or rdnI gene from that specific species. For the Version of Record, we have included analysis with a 70% cutoff in the supplemental information to try to account for sequence divergence by lowering the identity cutoffs as suggested. The data from the 70% identity cutoff was consistent with the original data from the 90% identity cutoff.
A description of gene-level read recruitment in the methods section relating to metagenomic analysis is lacking and should be provided.
Noted. We included the raw code and sequences on the OSF website associated with this manuscript https://osf.io/scb7z/.
Reviewer #3 (Public Review):
Summary:<br /> The authors discovered that the RdnE effector possesses DNase activity, and in competition, P. mirabilis having RdnE outcompetes the null strain. Additionally, they presented evidence that the RdnI immunity protein binds to RdnE, suppressing its toxicity. Interestingly, the authors demonstrated that the RdnI homolog from a different phylum (i.e., Actinomycetota) provides cross-species protection against RdnE injected from P. mirabilis, despite the limited identity between the immunity sequences. Finally, using metagenomic data from human-associated microbiomes, the authors provided bioinformatic evidence that the rdnE/rdnI gene pair is widespread and present in individual microbiomes. Overall, the discovery of broad protection by non-cognate immunity is intriguing, although not necessarily surprising in retrospect, considering the prolonged period during which Earth was a microbial battlefield/paradise.
Strengths:<br /> The authors presented a strong rationale in the manuscript and characterized the molecular mechanism of the RdnE effector both in vitro and in the heterologous expression model. The utilization of the bacterial two-hybrid system, along with the competition assays, to study the protective action of RdnI immunity is informative. Furthermore, the authors conducted bioinformatic analyses throughout the manuscript, examining the primary sequence, predicted structural, and metagenomic levels, which significantly underscore the significance and importance of the EI pair.
Weaknesses:<br /> (1) The interaction between RdnI and RdnE appears to be complex and requires further investigation. The manuscript's data does not conclusively explain how RdnI provides a "promiscuous" immunity function, particularly concerning the RdnI mutant/chimera derivatives. The lack of protection observed in these cases might be attributed to other factors, such as a decrease in protein expression levels or misfolding of the proteins. Additionally, the transient nature of the binding interaction could be insufficient to offer effective defenses.
Yes, we agree with the reviewer and hope that grant reviewers’ share this colleague’s enthusiasm for understanding the detailed molecular mechanisms of RdnE-RdnI binding across genera. In the revised manuscript, we have continued to emphasize such caveats as the next frontier is clearly understanding the molecular mechanisms for RdnI cognate or non-cognate protection. In the revised manuscript, figure 4 Supplement Figure 2 shows the RdnI protein levels; we also clarified the expression constructs in the text (see reviewer 2, comment 2).
(2) The results from the mixed population competition lack quantitative analysis. The swarm competition assays only yield binary outcomes (Yes or No), limiting the ability to obtain more detailed insights from the data.
The mixed swam assay is needed when studying T6SS effectors that are primarily secreted during Proteus’ swarming activity (Saak et al. 2017, Zepeda-Rivera et al. 2018). This limitation is one reason we utilize in vitro, in vivo, and bioinformatic analyses. Though the swarm competition assay yields a binary outcome, we are confident that the observed RdnI protection is due to interaction with a trans-cell RdnE via an active T6SS. By contrast, many manuscripts report co-expression of the EI pair (Yadev et al., 2021, Hespanhol et al., 2022) rather than secreted effectors, as we have achieved in this manuscript.
(3) The discovery of cross-species protection is solely evident in the heterologous expression-competition model. It remains uncertain whether this is an isolated occurrence or a common characteristic of RdnI immunity proteins across various scenarios. Further investigations are necessary to determine the generality of this behavior.
We agree, which is why we submitted this paper as a launching point for further investigations into the generality of non-cognate interactions and their potential impact on community structure.
Comments from Reviewing Editor:<br /> - In addition to the references provided by reviewer#2, the first manuscript to show non-cognate binding of immunity proteins was Russell et al 2012 (PMID: 22607806).<br /> - IdrD was shown to form a subfamily of effectors in this manuscript by Hespanhol et al 2022 PMID: 36226828 that analyzed several T6SS effectors belonging to PDDExK, and it should be cited.
We appreciate that the reviewer and eLife staff pointed out missed citations. We have incorporated these studies and cited them in the revised manuscript.
[1] The Proteus RdnE in complex with either the Prevotella or Pseudomonas RdnI showed low confidence at the interface (pIDDT ~50-70%); this AI-predicted complex might support the lack of binding seen in the bacterial two-hybrid assay. On the other hand, the Proteus and Rothia RdnI N-terminal regions show higher confidence at the interface with RdnE. Despite this, the C-terminus of the Proteus RdnI shows especially low confidence (pIDDT ~50%) where it might interact near RdnE’s active site (as suggested by reviewer 1). Given this low confidence and the already stated inaccuracies of AI-generated complexes, we would rather wait for crystallization data to inform potential protection mechanisms of RdnI.
Author response image 1.
-
eLife assessment
This study provides valuable insights into the specificity and promiscuity of toxic effector and immunity protein pairs. While the work is improved over a previous version, there are still some questions regarding the methodology used to draw certain conclusions, rendering the study somewhat incomplete. Nevertheless, this work will likely be of interest to microbiologists and biochemists working with toxin-antitoxin systems and effector-immunity proteins.
-
Reviewer #3 (Public Review):
Summary:
The authors discovered that the RdnE effector possesses DNase activity, and in competition, P. mirabilis having RdnE outcompetes the null strain. Additionally, they presented evidence that the RdnI immunity protein binds to RdnE, suppressing its toxicity. Interestingly, the authors demonstrated that the RdnI homolog from a different phylum (i.e., Actinomycetota) provides cross-species protection against RdnE injected from P. mirabilis, despite the limited identity between the immunity sequences. Finally, using metagenomic data from human-associated microbiomes, the authors provided bioinformatic evidence that the rdnE/rdnI gene pair is widespread and present in individual microbiomes. Overall, the discovery of broad protection by non-cognate immunity is intriguing, although not necessarily surprising in retrospect, considering the prolonged period during which Earth was a microbial battlefield/paradise.
Strengths:
The authors presented a strong rationale in the manuscript and characterized the molecular mechanism of the RdnE effector both in vitro and in the heterologous expression model. The utilization of the bacterial two-hybrid system, along with the competition assays, to study the protective action of RdnI immunity is informative. Furthermore, the authors conducted bioinformatic analyses throughout the manuscript, examining the primary sequence, predicted structural, and metagenomic levels, which significantly underscore the significance and importance of the EI pair.
Weaknesses:
(1) The interaction between RdnI and RdnE appears to be complex and requires further investigation. The manuscript's data does not conclusively explain how RdnI provides a "promiscuous" immunity function, particularly regarding the RdnI mutant/chimera derivatives. The lack of protection observed in these cases might be attributed to other factors, such as a decrease in protein expression levels or misfolding of the proteins. Additionally, the transient nature of the binding interaction could be insufficient to offer effective defenses.<br /> (2) The results from the mixed population competition would benefit from quantitative analysis. The swarm competition assays only yield binary outcomes (Yes or No), limiting the ability to obtain more detailed insights from the data.<br /> (3) The discovery of cross-species protection is solely evident in the heterologous expression-competition model. It remains uncertain whether this is an isolated occurrence or a common characteristic of RdnI immunity proteins across various scenarios. Further investigations are necessary to determine the generality of this behavior.
-
Reviewer #4 (Public Review):
Summary:
Knecht et al. elucidate a Type VI Secretion System (T6SS) effector-immunity pair in Proteus mirabilis. They demonstrate that the effector protein RdnE exhibits DNase activity in vitro and induces toxicity when ectopically expressed in cells, the latter being neutralized by the cognate immunity protein RdnI. The authors identify major regions within RdnI necessary for the interaction and neutralization of RdnE. Notably, they report cross-talk where both cognate and non-cognate RdnI proteins can neutralize RdnE, mitigating its fitness advantage in bacterial co-swarm assays. A comprehensive metagenomic analysis revealed an abundance of rdnI over rdnE genes in most gut samples, suggesting a potential role of rdnI in providing a fitness advantage against bacteria encoding for RdnE effector.
Strengths:
The authors successfully combined biochemical and microbiological experiments with bioinformatics analysis to advance the understanding of the T6SS-mediated population dynamics in bacteria. The co-swarm functional assay is of particular interest as it demonstrates how bacterial strains carrying only rdnI immunity genes could potentially compete in the same niche with other species armed with toxic rdnE effector genes. The manuscript is well-written, and the figures are self-explanatory.
Weaknesses:
(1) How would the authors explain the discrepancy observed in Figure 4 G and Figure 4 S3 B where two RdnI proteins from Prevotella and Pseudomonas genera do not bind to RdnE_Proteus in BACTH, whereas they co-elute with a RdnE_Proteus-FLAG with efficiency comparable to the cross-neutralizing RdnI_Rothia? Similarly, the interaction results obtained in BACTH with RdnI truncate (Figure 4E) or chimeric RdnI (Figure 4I, lane 4) could be a result of an overexpressed T18-fusion variant.<br /> Alternative in vitro protein binding assay would be beneficial.
(2) Based on the bioinformatic analysis the Rothia and Prevotella species harboring rdnE/I genes co-occurred in 5% of metagenomes tested, suggesting that these bacteria could come into contact. The manuscript would benefit greatly if authors demonstrated that RdnI proteins from Rothia or Prevotella could cross-neutralize its own and its 'neighbor' RdnE effectors, for example in an E. coli viability assay. The cross-neutralizing co-swarming results (Figure 4F) could also be further validated in viability assay as shown in Figure 2 S1.
(3) Little is known about whether RdnE is delivered via T6SS as a full-length protein or as the shorter C-terminal fragment. There is a possibility that immunity proteins could recognize RdnE regions beyond the C-terminal 138 amino acids that authors used in their in vitro assays.
-
Reviewer #5 (Public Review):
This work investigates a T6SS effector-immunity pair from Proteus mirabilis. The authors make several interesting claims, particularly regarding the mechanism of effector inhibition by the immunity protein. However, it appears that these claims are not fully supported by the evidence provided.
I have read the revised manuscript, the public reviews, and the authors' updated responses to these reviews. In my opinion, the concerns raised by the reviewers remain relevant even after the authors' revisions. Since previous reviews have excellently described the strengths and weaknesses of this work, I will focus on my major concerns:
(1) The authors describe RdnE-RdnI, a T6SS effector-immunity pair from Proteus mirabilis. RdnE is actually the C-terminal domain of IdrD, a 1581-amino-acid protein containing PAAR and RHS domains. This work does not provide evidence for T6SS-dependent secretion of the effector, instead supplying references to previous works.
(2) While the authors claim the function of the RdnE domain is unknown, it was previously shown to be evolutionarily related to PoNe and TseV, both of which are known DNA nucleases. Although the authors cite the relevant references, they do not clearly disclose this information.
(3) The authors claim that RdnE contains two different domains: the first is the PD-(D/E)XK domain, and the second, referred to as "region 2," follows it. Unfortunately, no structural evidence is provided to support this claim, not even a predicted model demonstrating that these are indeed separate domains.
(4) One of the major claims made in this work is that RdnI binding to RdnE is not sufficient for RdnE inhibition, suggesting a more sophisticated mechanism. The authors base this theory on differences between the ability of RdnI to bind RdnE (shown using bacterial two-hybrid assays) and the ability to protect against RdnE toxicity in swarm competition assays. Specifically, they show that the first 85 amino acids of RdnI bind to the short RdnE domain in the bacterial two-hybrid assay but do not protect against the full-length effector in the swarm competition assay. They also demonstrate that performing seven mutations in conserved residues in RdnE or replacing parts of RdnI with parts from other RdnI homologs leads to the same phenomenon.
While these findings are interesting and even intriguing, in my opinion, the current evidence does not support their theory. A simple explanation for the differences between the assays is that while the N-terminal domain of RdnI is sufficient for binding to RdnE, inhibition of the active site of RdnE requires binding of a second domain to RdnE. In that sense, it should be noted that while the authors use co-IP assays to show the interaction between RdnE and full-length RdnI, they do not use it to show the interaction between RdnE and the first 85 amino acids of RdnI.
(5) The authors claim that a "conserved motif" within RdnI plays a role in the inhibition of RdnE. To investigate this, they replace this motif with sequences from several RdnI homologs, demonstrating that in one case, it is possible to exchange these conserved motifs between RdnI homologs that inhibit Proteus RdnE. However, they also show that even if the conserved motif is taken from an RdnI homolog that cannot inhibit Proteus RdnE, the hybrid protein can still protect cells in a swarm competition assay. This result raises concerns regarding the relevance of this conserved motif.
(6) Lastly, regarding the theory that immunity proteins can protect against non-cognate effectors, it appears that the authors based their theory on a single case where RdnI from Rothia protected against RdnE from Proteus. In my opinion, a more thorough investigation, involving testing many homologs, is needed to substantiate this theory.
-
-
www.biorxiv.org www.biorxiv.org
-
Author response:
The following is the authors’ response to the original reviews.
eLife assessment
In this fundamental study, the authors use innovative fine-scale motion capture technologies to study visual vigilance with high-acuity vision, to estimate the visual fixation of free-feeding pigeons. The authors present convincing evidence for use of the fovea to inspect predator cues, the behavioral state influencing the latency for fovea use, and the use of the fovea decreasing the latency to escape of both the focal individual and other flock members. The work will be of broad interest to behavioral ecologists.
We thank the editor for his interest and feedback on the manuscript. We hereafter addressed the comments of the reviewer.
Reviewer #1 (Public Review):
Summary:
The authors were using an innovative technic to study the visual vigilance based on high-acuity vision, the fovea. Combining motion-capture features and visual space around the head, the authors were able to estimate the visual fixation of free-feeding pigeon at any moment. Simulating predator attacks on screens, they showed that 1) pigeons used their fovea to inspect predators cues, 2) the behavioural state (feeding or head-up) influenced the latency to use the fovea and 3) the use of the fovea decrease the latency to escape of both the individual that foveate the predators cues but also the other flock members.
Strengths:
The paper is very interesting, and combines innovative technic well adapted to study the importance of high-acuity vision for spotting a predator, but also of improving the behavioural response (escaping). The results are strong and the models used are well-adapted. This paper is a major contribution to our understanding of the use of visual adaptation in a foraging context when at risk. This is also a major contribution to the understanding of individual interaction in a flock.
Weaknesses:
I have identified only two weaknesses:
(1) The authors often mixed the methods and the results, Which reduces the readability and fluidity of the manuscript. I would recommend the authors to re-structure the manuscript.<br /> (2) In some parts, the authors stated that they reconstructed the visual field of the pigeon, which is not true. They identified the foveal positions, but not the visual fields, which involve different sectors (binocular, monocular or blind). Similarly, they sometimes mix-up the area centralis and the fovea, which are two different visual adaptations.
Thank you for your positive feedback. We addressed these comments by restructuring the methods and result sections as suggested, and by checking the terminology and specific vocabulary used throughout the manuscript.
Reviewer #1 (Recommendations For The Authors):
First, I would like to say that I really enjoyed the manuscript. This is a great contribution to the field.
Thank you for the positive feedback, we highly appreciate it.
Then, I have some comments that I hope, would help the authors to improve the manuscript.
Major comments :
I would recommend the authors to restructure the methods and the results section. In many parts, the models used are presented in the results section, while this should be presented in the methods section.
Thank you for the suggestion, we now have ensured that the model descriptions are presented in the statistic section of the methods.
To me, the introduction is too long (more than 5 pages). It would be beneficial to reduce it considerably. Furthermore, in the introduction, it misses some information about the visual abilities of your species ((visual acuity, visual field, temporal resolution, contrast sensitivity....).
We agree that the introduction was very long and reduced it by removing the “Methodological issues” as well as strongly reducing the “Experimental rationales” to a minimum. We also added the missing information on the visual abilities of the pigeons in the “Experimental rationales” section (see L135-150). Please note, however, that we refer to the temporal resolution of pigeon vision in the method section, to associate it with the information of the used monitor’s resolution.
Minor comments :
Lines 37-39: This needs a reference.
A reference has been added (McFarland, 1977)
Lines 39-41: But see some papers published recently on Harris's hawks.
Thank you for the references, we added the citation as well as a few more papers (Kane et al., 2015; Kano et al., 2018; Miñano et al., 2023; Yorzinski & Platt, 2014).
Lines 41-43: This sentence needs a reference as well.
A reference has been added (Cresswell, 1994; M. H. R. Evans et al., 2018; Inglis & Lazarus, 1981)
Lines 56-103: In this paragraph, head down and head up also depends from the retinal map of the birds! Some birds have visual streak that allow them to see a potential threats while foraging. Please add more information about the importance of photoreceptors distribution.
Thank you for pointing out this issue. We rewrote the sentence L65-69 as follows to include the importance retinal structures.
“In several species, especially those with a broad visual field and specific retinal structures such as the visual streaks, individuals can simultaneously engage in foraging activities while remaining vigilant (Fernández-Juricic, 2012), likely using peripheral vision to detect approaching threats (Bednekoff & Lima, 2005; Cresswell et al., 2003; Kaby & Lind, 2003; Lima & Bednekoff, 1999).”
Lines 76-79: you wrote : ".... favor alternative hypotheses based on their findings". Which findings? You need to explain.
We rewrote this part as follows (L80-81).
“other studies found evidence for the risk dilution (Beauchamp & Ruxton, 2008) and the edge effect (Inglis & Lazarus, 1981) in their study systems.”
Lines 109-110: It would be good to have a representation of what is an area and a fovea, and how it is placed in the eye, what type of fovea exists and how it is related to visual field. Where does it project?
We now give a better description of the pigeon’s visual field in the experimental rationales section that we hope will help the reader understanding the key features of pigeon’s vision (see L135-150). Specifically, we now say in L137-138:
“they have one fovea centrally located in the retina of each eye, with an acuity of 12.6 c/deg (Hodos et al., 1985). Their fovea projects laterally at ~75° into the horizon in their visual field.”
Lines 109-113: You might need to see some new papers here about the fovea. See for instance Bringmann 2019.
Thank you for the suggestion, we now give a more precise definition of the fovea and refer to Bringmann’s paper for more details (L113-114):
“a pit-like area in the retina with high concentration of cone cells where visual acuity is highest, and is responsible for sharp, detailed, and color vision.”
Lines 113-120: Please explain how the visual field is related to fovea? Where is the fovea project in the visual fields?
Similarly to the question above, we now give a more precise description of the pigeon’s visual field (see L135-150).
Line 131-134: For a non-expert, you would need to explain what is micro, meso and macro scale?
These sentences have been removed when shortening the introduction and we are not referring to micro, meso and macro scales anymore.
Lines 134-136: Please explain in one sentence the technique here.
We now explain in one sentence how motion capture enables the tracking of head and body orientation (L130-132):
“Motion capture cameras track with high accuracy the 3D position of markers, which, when attached to the pigeon’s head and body, enables to reconstruct the rotations of the head and body in all directions.”
Line 140: You presented here for the first time the word "foveation". Has this term been used before? If so, please add a reference. If not, please explain what you mean by foveation precisely.
Thank you for noticing this lack. We are now providing the following definition “directing visual focus to the fovea to achieve the clearest vision” in the first place where we mention the term foveation (L149-150).
Lines 146-148: Please explain why this proves that it is appropriate to not record eyes movements, and is this true for every behaviours?
We acknowledge that some small eye movement might occur and reduce the accuracy of the method. This error is considered in the system using the +-10 degrees range around the foveas. The lines the reviewer referred to were removed when shortening the introduction, but we added an explanation in the paragraph describing pigeon vision to make it clearer (L147-150):
“Yet, it should be noted that their eye movement was not tracked in our system, although it is typically confined within a 5 degrees range (Wohlschläger et al., 1993). We thus considered this estimation error of the foveation (directing visual focus to the fovea to achieve the clearest vision) in our analysis, as a part of the error margin (see Methods).”
Lines 161-163: What is the frontal and binocular field for? You would need to explain the different fields of view and what they are supposed to be for.
Furthermore, does the visual field of pigeon have been studied? If so, you would need to add more information about it.
This information is now given in the new paragraph describing the pigeon’s vision in the “Experimental rationales” section (see L135-150).
Figure 1: It is not clear here which panels correspond to a, b or c. Please use some boxes to clarify it.
Thank you for the comment, we now have made the figure’s sub-panels clearer.
Lines 193-194: You wrote "... such as foveas (also known as the area centralis). No, this is not the same.
(1) In some species, you have two foveas, one placed centrally in the retina, one place temporally. So the fovea is not the area centralis.
(2) Second, some species do have an area centralis but without a fovea.
Thank you for pointing out the inaccuracy. In this case, we were referring specifically to the pigeon’s fovea which is sometimes referred to as “area centralis”, but we now changed the sentence as follow to avoid any confusion (L174-175):
“The initial two hypotheses (Hypotheses 1 and 2) aim to examine whether foveation correlates with predator detection.”
Lines 192-212: I did not understand the logic of the hypotheses numbers? Why do you have 2.1 but not 3.1 for instance? And if you have two hypotheses for the within a global one (for instance, 2.1 and 2.2), what is the main hypothesis 2? You should explain more here because we get lost here and in the result section as well.
We recognize this section might have appeared confusing to the reader. In short, we had four main hypotheses: 1) the fovea is used to evaluate predator cues, 2) the latency to foveate is related to vigilance behaviors. These first 2 hypotheses aimed to determine if the latency to foveate on the predator cue could be related to the detection. 3) foveation is related to the escape response of the pigeons and 4) there is a collective influence in the escape response. We further divided some of the hypotheses into 2 sub-hypotheses whenever 2 different tests were used to answer the same question. We have modified this section to be clearer.
Lines 224-229: Where are the figures and statistics for these results?
These results are presented in Table S1. We apologize for forgetting to add this reference and have now added it (L211).
Lines 229-231: This should be in the method section.
This model explanation (as well as all other hereafter mentioned) have been moved to the method section as suggested.
Lines 248-252: This should be in the method section. Furthermore, you should better explain the model selection.
Please see earlier comment. Additionally, we are now better explaining how the model has been built.
Figure 2: It is not clear on the figure which letters correspond to which panels. Please improve the readability of the figure.
It was modified accordingly.
Lines 274-278: This should be in the method section.
Please see earlier comment.
Line 281: The "Fig.3" should be mentioned in the previous sentence.
It was modified accordingly.
Figure 3: Please explain why the latency to foveate had negative values in Fig.2 but not here, and not in Fig. 4 as well. This again highlights that we missed a number of information in the methods about the transformation of the data and the model selection.
The variable presented in Fig 2d is not the latency to foveate but the “Normalized frequency at which the object was observed within foveal regions” (hypothesis 1). It represents the amount of time the object was lying within one of the foveal regions of the individual (“how long the pigeons foveated on it”), further normalized to unit sum to make all objects comparable. This variable was indeed logit-transformed (hence the negative value) to improve residual fit in the model, but this information (as well as other transformations) are always clearly stated on the axis caption of the graphs. Additionally, we now have improved the statistical analysis section to make the model used for each hypothesis testing clearer. But please let us know if you have suggestions for a further improvement in terms of presentation.
Lines 297-301: This should be in the method section.
Please see earlier comment.
Lines 301-305: Fig. 3 b and c only referred to the two first factors. Please add more figures for the other factors. This could be in supp. Mat.
We added the 3 graphs for the proportion of time foveating on the monitor, the saccade rate and the proportion of time foveating on conspecifics in the supplementary (Fig S6).
Lines 306-309: This should be in methods, and you should have explained in methods how you performed your model selection.....
We prefer leaving this paragraph in the result section, as it was intended to give the reader extra information on the predictive power of the different variables (by comparing the effectiveness of the models including one variable at a time, all the rest being equal) and not on the model selection per se. However, we now explain our goal better in the statistics section regarding this analysis (L635-636):
“We further tested the relative predictive power of the different test variables by comparing the resulting models’ efficiency using AIC scores.”
Lines 317-319: This should be in the method section.
Please see earlier comment.
Lines 320-322: This should be in the method section.
Please see earlier comment.
Lines 332-334: This should be in the method section.
Please see earlier comment.
Lines 334-336: Then, if this is not significant, you cannot say that.
Thank you for noticing the inaccuracy, we have now rephrased it as (L298-299):
“Earlier foveation of the first pigeon was not significantly related to an earlier escape responses among the other flock members, although there was a trend (χ2(1) = 3.66, p = 0.0559).”
Line 336: Please explain why you did different models. We missed a lot of information in the method about your strategy for statistics.?
We have now added a lot more information on the models in the statistics section, according to this comment as well as the previous ones. We hope the explanations of the analyses are now clearer to the reader.
Lines 339-349: This should be in the method section.
Please see earlier comment.
Results section: As you may have understood, there are too many sentence that should be moved into the method section. Futhermore, I would recommend to modify the headdings so that they are more biologically speaking. Similarly to what you have done in the discussion section.
Thank you for the comments. We agree with most of them, and have modified the manuscript accordingly. Additionally, we now use the same headings in the results section as the ones used in the discussion to make the text easier to follow.
Lines 500-501: What were the body weight of the pigeon? At which weight of their full weight they were?
This information is now added (492 ± 41g; mean ± SD). We did not control the amount of food during our experiments and only ensured 24h without food by feeding the pigeons after the experiment was completed. This information was added as follows (L454-456):
“On experimental days, they were fed only after the experiments was completed; this ensures 24-hour no feeding at the time of the experiment, although we did not control the amount of the food over the course of the experimental periods.”
Line 522-523: Those screens are very good for pigeons.
Thank you for the positive comment, we indeed tried to match bird vision as close as possible.
Lines 527-528: At which frequency was produced the moving stimulus? Your screen can display up to 144Hz, which is very good. But can your laptop do it? If not, it is important to mention it as pigeons may have a temporal resolution of vision up to 149Hz.
Our laptop indeed supports 144Hz display. In addition, we now mention the temporal resolution of pigeon vision (L480-482).
“We specifically chose a monitor with high temporal resolution to match the pigeon’s Critical Flicker Fusion Frequency (threshold at which a flickering light is perceived by the eye as steady) that reaches up to 143Hz (Dodt & Wirth, 1954).”
Lines 555-572: Did you use a control shape in your experiment? Indeed, they may escape because of a moving pattern but not a predator shape?
We did not use a control shape, as the aim of the experiment was not to directly test the effect of the shape itself. We designed the predator cue to resemble an approaching predator to ensure a response from the pigeons, but it might be that other shapes would have worked as well.
Lines 588-589: Please explain why the coordinate system of the pigeon's head is considered as the visual field?
From what I have understood, you did not reconstruct the visual fields, but only the position of the fovea. This should be noted like this as visual field involves more than a sphere around the head (binocular and monocular sectors, blind sectors, vertical extension....).
Thank you for noticing the inaccuracy, we indeed did not consider other sectors of the visual field and therefore rephrased it as (L551): “the location of the objects and conspecifics from the pigeon’s perspective”.
Lines 601-604: How much does it represent?
As this was estimated by visual inspection, we do not have the exact percentage of data loss that was caused by grooming. However, because of the number of cameras in the SMART BARN motion capture system, it is reliable in detecting markers inside the space in “ideal” conditions (without occlusion). For example, a similar set-up found marker track loss of only <1% using a model bird (Itahara & Kano 2022)
Itahara, A., & Kano, F. (2022). “Corvid Tracking Studio”: A custom-built motion capture system to track head movements of corvids. Japanese Journal of Animal Psychology, 72(1), 1–16. https://doi.org/10.2502/janip.72.1.1
Lines 610-612: You would need to cite Wood 1917 and Hodos et al. 1991 who described the presence of a fovea in this species.
We added both citations to the manuscript.
Line 611: Again, the fovea is not egal to area centralis.
Thank you, we changed it as well.
Lines 625-626: you wrote "... in a few instances....". Please explain more. How many? What proportion?
This happened in 9 observations out of 120. We now specify it in the text as well (L587-589):
“in a few instances (9 out of 120 observations), pigeons foveated on the model predator after the looming stimulus had disappeared, but these cases were excluded from our analysis.”
Lines 640-653: We missed a lot of information in the section "statistical analysis". If you moved most of the sentence from the results that describe the methods in the method section, that would be much better. Furthermore, you would need to explain more what statistics you used, which model selection, what type of data transformation....
We agree this section lacked information, and we moved the information from the result to the statistics section.
Supplmentary materials: boxplots from Fig. S1 and S2 are too small and impossible to read. Please improve the readability.
We now have enlarged these plots to make them more readable.
-
eLife assessment
In this fundamental study, the authors use innovative fine-scale motion capture technologies to study visual vigilance with high-acuity vision, to estimate the visual fixation of free-feeding pigeons. The authors present compelling evidence for use of the fovea to inspect predator cues, the behavioral state influencing the latency for fovea use, and the use of the fovea decreasing the latency to escape of both the focal individual and other flock members. The work will be of broad interest to behavioral ecologists.
-
Public Review:
The authors used an innovative technic to study the visual vigilance based on high-acuity vision, the fovea. Combining motion-capture features and visual space around the head, the authors were able to estimate the visual fixation of free-feeding pigeon at any moment. Simulating predator attacks on screens, they showed that 1) pigeons used their fovea to inspect predators cues, 2) the behavioural state (feeding or head-up) influenced the latency to use the fovea and 3) the use of the fovea decrease the latency to escape of both the individual that foveate the predators cues but also the other flock members.
The paper is very interesting, and combines innovative technic well adapted to study the importance of high-acuity vision for spotting a predator, but also of improving the behavioural response (escaping). The results are strong and the models used are well-adapted. This paper is a major contribution to our understanding of the use of visual adaptation in a foraging context when at risk. This is also a major contribution to the understanding of individual interaction in a flock.
-
-
www.biorxiv.org www.biorxiv.org
-
Joint Public Review:
The manuscript "Engineering of PAClight1P78A: A High-Performance Class-B1 GPCR-Based Sensor for PACAP1-38" by Cola et al. presents the development of a novel genetically encoded sensor, PAClight1P78A, based on the human PAC1 receptor. The authors provide a thorough in vitro and in vivo characterization of this sensor, demonstrating its potential utility across various applications in life sciences, including drug development and basic research.
The main criticism of this manuscript after initial review is that the PACLight1 sensor has not been shown to detect the release of endogenous PACAP, whether in culture, in vivo, or ex vivo. The authors appear to be cognizant of this significant limitation (for a PACAP sensor) but no significant changes to address this limitation are provided in the revision.
While the sensor that is described here is new and the experimental results support the conclusions, the sensor reported here is not suited for the detection of endogenous PACAP release in vivo. In some respects, this manuscript could be seen as a stepping stone for further development either by the authors or other groups. Indeed, in many cases initial versions of genetically encoded sensors undergo substantial development post-publication, as exemplified by the evolution of GCaMP. However, the situation with the PAClight sensor reported here requires a different approach. Unlike GCaMP, which was one of the first genetically encoded calcium indicators, PAClight is another variant in a series of GPCR-fluorophore conjugates, following methodologies similar to those developed in the Lin Tian lab and the multiple GRAB-based sensors from Yulong Li's lab. These sensors have already demonstrated in vivo applicability, setting a standard that PAClight must meet or exceed to confirm its value and novelty.
Given that the title of the manuscript, "Probing PAC1 receptor activation across species with an engineered sensor," implies broader applicability, it potentially misleads readers about the sensor's utility in vivo, where "in vivo" should be understood as referring to the detection of endogenous PACAP release.
To align the manuscript with the expectations set by its title, it is crucial that the authors either provide substantial in vivo validation (ability to detect endogenous release of PACAP) or revise the title and the text to clarify that the sensor is primarily intended to detect exogenously applied PACAP. This clarification will ensure that the manuscript accurately reflects the sensor's current capabilities and scope of use.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public Review):
Summary:
The manuscript "Engineering of PAClight1P78A: A High-Performance Class-B1 GPCR-Based Sensor for PACAP1-38" by Cola et al. presents the development of a novel genetically encoded sensor, PAClight1P78A, based on the human PAC1 receptor. The authors provide a thorough in vitro and in vivo characterization of this sensor, demonstrating its potential utility across various applications in life sciences, including drug development and basic research.
The diverse methods to validate PAClight1P78A demonstrate a comprehensive approach to sensor engineering by combining biochemical characterization with in vivo studies in rodent brains and zebrafish. This establishes the sensor's biophysical properties (e.g., sensitivity, specificity, kinetics, and spectral properties) and demonstrates its functionality in physiologically relevant settings. Importantly, the inclusion of control sensors and the testing of potential intracellular downstream effects such as G-protein activation underscore a careful consideration of specificity and biological impact.
Strengths:
The fundamental development of PAClight1P78A addresses a significant gap in sensors for Class-B1 GPCRs. The iterative design process -starting from PAClight0.1 to the final PAClight1P78A variant - demonstrates compelling optimization. The innovative engineering results in a sensor with a high apparent dynamic range and excellent ligand selectivity, representing a significant advancement in the field. The rigorous in vitro characterization, including dynamic range, ligand specificity, and activation kinetics, provides a critical understanding of the sensor's utility. Including in vivo experiments in mice and zebrafish larvae demonstrates the sensor's applicability in complex biological systems.
Weaknesses:
The manuscript shows that the sensor fundamentally works in vivo, albeit in a limited capacity. The titration curves show sensitivity in the nmol range at which endogenous detection might be possible. However, perhaps the sensor is not sensitive enough or there are not any known robust paradigms for PACAP release. A more detailed discussion of the sensors's limitations, particularly regarding in vivo applications and the potential for detecting endogenous PACAP release, would be helpful.
We thank the reviewer for carefully analyzing our in vivo data and highlighting the limitation of our results regarding the sensor’s applicability in detecting endogenous PACAP. We added several sections conversing future possibilities for optimization in the discussion (see paragraphs 2-4). We agree that a more specific discussion of the limitations of our study is an important addition to help design future experiments.
There are several experiments with an n=1 and other low single-digit numbers. I assume that refers to biological replicates such as mice or culture wells, but it is not well defined. n=1 in experimental contexts, particularly in Figure 1, raises significant concerns about the exact dynamic range of the sensor, data reproducibility, and the robustness of conclusions drawn from these experiments. Also, ROI for cell cultures, like in Figure 1, is not well defined. The methods mentioned ROIs were manually selected, which appears very selective, and the values in Figure 1c become unnecessarily questionable. The lack of definition for "ROI" is confusing. Do ROIs refer to cells, specific locations on the cell membrane, or groups of cells? It would be best if the authors could use unbiased methods for image analysis that include the majority of responsive areas or an explanation of why certain ROIs are included or excluded.
We thank the reviewer for the helpful suggestions. We have increased the number of replicates to n=3 for both HEK293T and neuron data depicted in Fig.1c. Furthermore, we have added Fig.1c’ containing the quantification of the maximum responses obtained in the dataset shown in Fig.1c also depicting the single values for each replicate. To clarify the definition of an ROI in our manuscript, we have detailed the process of ROI selection in the Methods section “Cell culture, imaging and quantification section”. Additionally, we also increased mouse numbers for in vivo PACAP infusions in mice (see Figure 4g).
Reviewer #2 (Public Review):
Summary:
The PAClight1 sensor was developed using an approach successful for the development of other fluorescence-based GPCR sensors, which is the complete replacement of the third intracellular loop of the receptor with a circularly-permuted green fluorescent protein. When expressed in HEK cells, this sensor showed good expression and a weak but measurable response to the extracellular presence of PACAP1-38 (a
F/Fo of 43%). Additional mutation near the site of insertion of the linearized GPF, at the C-terminus of the receptor, and within the second intracellular loop produced a final optimized sensor with F/Fo of >1000%. Finally, screening of mutational libraries that also included alterations in the extracellular ligand-binding domain of the receptor yielded a molecule, PAClight1P78A, that exhibited a high ligand-dependent fluorescence response combined with a high differential sensitivity to PACAP (EC50 30 nM based on cytometric sorting of stably transfected HEK293 cells) compared to its congener VIP, (with which PACAP shares two highly related receptors, VPAC1 and VPAC2) as well as several unrelated neuropeptides, and significantly slowed activation kinetics by PACAP in the presence of a 10-fold molar excess of the PAC1 antagonist PACAP6-38. A structurally highly similar control construct, PAClight1P78Actl, showed correspondingly similar basal expression in HEK293 cells, but no PACAP-dependent enhancement in fluorescent properties.
PAClight1P78A was expressed in neurons of the mouse cortex via AAV9.hSyn-mediated gene transduction. Slices taken from PAClight1P78A-transfected cortex, but not slices taken from PAClight1P78Actl-transfected cortex exhibited prompt and persistent elevation of F/Fo after 2 minutes of perfusion with PACAP1-38 which persisted for up to 14 minutes and was statistically significant after perfusion with 3000, but not 300 or 30 nM, of peptide. Likewise, microinfusion of 200 nL of 300 uM PACAP1-38 into the cortex of optical fiber-implanted freely moving mice elicited a F/Fo (%) of greater than 15, and significantly higher than that elicited by application of similar concentrations of VIP, CRF, or enkephalin, or vehicle alone. In vivo experiments were carried out in zebrafish larvae by the introduction of PAClight1P78A into single-cell stage Danio rerio embryos using a Tol2 transposase-based plasmid with a UAS promoter via injection (of plasmid and transposase mRNA), and sorting of post-fertilization embryos using a marker for transgenesis carried in the UAS :
PAClight1P78A construct. Expression of PAClight1P78A was directed to cells in the olfactory bulb which express the fish paralog of the human PAC1 receptor by using the Tg(GnRH3:gal4ff) line, and fluorescent signals were elicited by intracerebroventricular administration of PACAP1-38 at a single concentration (1 mM), which were specific to PACAP and to the presence of PAClight1P78A per se, as controlled by parallel experiments in which PAClight1P78Actl instead of PAClight1P78A was contained in the transgenic plasmid.
Major strengths and weaknesses of the methods and results
The report represents a rigorous demonstration of the elicitation of fluorescent signals upon pharmacological exposure to PACAP in nervous system tissue expressing PAClight1P78A in both mammals (mice) and fish (zebrafish larvae). Figure 4d shows a change in GFP fluorescence activation by PACAP occurring several seconds after the cessation of PACAP perfusion over a two-minute period, and its persistence for several minutes following. One wonders if one is apprehending the graphical presentation of the data incorrectly, or if the activation of fluorescence efficiency by ligand presentation is irreversible in this context, in which case the utility of the probe as a real-time indicator, in vivo, of released peptide might be diminished.
We thank the reviewer for their careful consideration of our manuscript and agree that the activation of PAClight persisting for several minutes at micromolar concentrations could be a potential limitation for in vivo applications. We added a possible explanation for the persisting sensor activation in response to artificial application of PACAP38 in paragraph 3 of the discussion. We agree that this addition eases the interpretation of PAClight signals detected in vivo.
Appraisal of achievement of aims, and data support of conclusions:
Small cavils with controls are omitted for clarity; the larger issue of appraisal of results based on the scope of the designed experiments is discussed in the section below. An interesting question related to the time dependence of the PACAP-elicited activation of PAClight1P87A is its onset and reversibility, and additional data related to this would be welcome.
We agree that the reversibility of the sensor’s fluorescence is indeed an important feature especially for detecting endogenous PACAP release. Our data indicate that the sensor’s fluorescence is reversible when detecting small to medium doses of PACAP38 (see Figure 4d – Application of 30-300nM) that are presumably closer to physiological concentrations than the non-reversible concentration of 3000nM. Please, see also our new discussion on peptide concentrations in paragraph 4 of our discussion. For future experiments, it is indeed advisable to adjust the interval of repeated applications to the decay of the response at the respective concentration. Considering, the long-lasting downstream effects of endogenous signaling, longer intervals between ligand applications are generally preferred to match more closely the physiological range in which endogenous PAC1 is most likely affective.
Discussion of the impact of the work, and utility of the methods and data:
Increasingly, neurotransmitter function may be observed in vivo, rather than by inferring in vivo function from in vitro, in cellular, or ex vivo experimentation. This very valuable report discloses the invention of a genetically encoded sensor for the class B1 GPCR PAC1. PAC1 is the major receptor for the neuropeptide PACAP, which in turn is a major neurotransmitter involved in brain response to psychogenic stress, or threat, in vertebrates as diverse as mammals and fishes. If this sensor possesses the sensitivity to detect endogenously released PACAP in vivo it will indeed be an impactful tool for understanding PACAP neurotransmission (and indeed PACAP action in general, in immune and endocrine compartments as well) in future experiments.
However, the sensor has not yet been used to detect endogenously released PACAP. Until this has been done, one cannot answer the question as to whether the levels of exogenously perfused/administered PACAP used here merely to calibrate the sensor's sensitivity are indeed unphysiologically high. If endogenous PACAP levels don't get that high, then the sensor will not be useful for its intended purpose. The authors should address this issue and allude to what kind of experiments would need to be done in order to detect endogenous PACAP release in living tissue in intact animals. The authors could comment upon the success of other GPCR sensors that have been used to observe endogenous ligand release, and where along the pathway to becoming a truly useful reagent this particular sensor is.
We thank the reviewer for highlighting the lack in clarity that the scope of this paper was not intended to cover the detection of endogenous PACAP release. We therefore expanded our discussion to encompass the intended purpose of detecting artificially infused or applied PAC1 agonists, such as conducting fundamental tests of drug specificity and developing new pharmacological ligands to selectively target PAC1. This includes a more detailed discussion of our in vivo findings and a clearer phrasing that stresses the potential application for applied drugs and not endogenous PACAP (see last paragraph in the discussion).
We also agree that little is known about endogenous concentrations of PACAP in the brain. However, we have supplemented our discussion with several references estimating lower concentrations of PACAP and other peptides in vivo, suggesting average PACAP levels below the detection threshold of the sensor. Importantly, within certain brain regions and in closer proximity to release sites, significantly higher concentrations might be reached. Additionally, our data indicate that the concentrations observed under our current conditions do not saturate the sensor in vivo.
We therefore acknowledge the reviewer’s comment on the sensor’s potential limitations under our current experimental conditions. Hence, we expanded our discussion and suggest the use of higher resolution imaging to potentially reveal loci of high PACAP concentrations, which should be validated by future studies (see also our added discussion in paragraph 4).
Reviewer #3 (Public Review):
Summary:
The manuscript introduces PAClight1P78A, a novel genetically encoded sensor designed to facilitate the study of class-B1 G protein-coupled receptors (GPCRs), focusing on the human PAC1 receptor. Addressing the significant challenge of investigating these clinically relevant drug targets, the sensor demonstrates a high dynamic range, excellent ligand selectivity, and rapid activation kinetics. It is validated across a variety of experimental contexts including in vitro, ex vivo, and in vivo models in mice and zebrafish, showcasing its utility for high-throughput screening, basic research, and drug development efforts related to GPCR dynamics and pharmacology.
Strengths:
The innovative design of PAClight1P78A successfully bridges a crucial gap in GPCR research by enabling realtime monitoring of receptor activation with high specificity and sensitivity. The extensive validation across multiple models emphasizes the sensor's reliability and versatility, promising significant contributions to both the scientific understanding of GPCR mechanisms and the development of novel therapeutics. Furthermore, by providing the research community with detailed methodologies and access to the necessary viral vectors and plasmids, the authors ensure the sensor's broad applicability and ease of adoption for a wide range of studies focused on GPCR biology and drug targeting.
Weaknesses
To further strengthen the manuscript and validate the efficacy of PAClight1P78A as a selective PACAP sensor, it is crucial to demonstrate the sensor's ability to detect endogenous PACAP release in vivo under physiological conditions. While the current data from artificial PACAP application in mouse brain slices and microinfusion in behaving mice provide foundational insights into the sensor's functionality, these approaches predominantly simulate conditions with potentially higher concentrations of PACAP than naturally occurring levels.
We thank the reviewer for their valuable comments and agree that the use of PAClight for detecting endogenous PACAP will be of big interest for the scientific community and should be a goal for future research. Considering the time, equipment and additional animal licenses necessary, we are convinced that these questions would go beyond the scope of the current paper and might rather be addressed in a follow-up publication. We therefore rephrased the discussion and added more details to clarify further the intended purpose of the current study. Additionally, we added a paragraph in the discussion suggesting experiments needed to validate PAClight for putative future in vivo applications.
Although the sensor's specificity for the PAC1 receptor and its primary ligand is a pivotal achievement, exploring its potential application to other GPCRs within the class-B1 family or broader categories could enhance the manuscript's impact, suggesting ways to adapt this technology for a wider array of receptor studies. Additionally, while the sensor's performance is convincingly demonstrated in short-term experiments, insights into its long-term stability and reusability in more prolonged or repeated measures scenarios would be valuable for researchers interested in chronic studies or longitudinal behavioral analyses. Addressing these aspects could broaden the understanding of the sensor's practical utility over extended research timelines.
We extend our gratitude to the reviewer for diligently assessing our results.
Indeed, the very high level of sensitivity that we could achieve in PAClight leads us to think that potentially a grafting-based approach, such as the one we’ve recently described for class-A GPCR-based sensors (PMID: 37474807) could also work for the direct generation of multiple class-B1 sensors based on the optimized fluorescent protein module present in PAClight. Unfortunately, considering the amount of work that testing this hypothesis would entail, we are not able to perform these experiments in the context of this revision, and would rather pursue them as a future project. Nevertheless, we have expanded the discussion of the manuscript with a paragraph with these considerations.
While we lack comprehensive data on the long-term stability of the sensor, our preliminary findings from photometry recordings optimization indicate consistent baseline expression of PAClight and PACLight ctrl over several weeks. Conducting experiments to systematically assess stability would require several months, which is currently impractical due to limitations in tools and licenses for repeated in vivo infusions. Hence, we intend to include these experiments in potential follow-up studies.
Furthermore, the current in vivo experiments involving microinfusion of PACAP near sensor-expressing areas in behaving mice are based on a relatively small sample size (n=2), which might limit the generalizability of the findings. Increasing the number of subjects in these experimental groups would enhance the statistical power of the results and provide a more robust assessment of the sensor's in vivo functionality. Expanding the sample size will not only validate the findings but also address potential variability within the population, thereby reinforcing the conclusions drawn from these crucial experiments.
We agree with the reviewer that a sample size of N=2 is not sufficient for in vivo recordings. We therefore increased the sample size and now present recordings with 5 PAClight1P78A and 4 PACLight-control mice. Of note, the new data validate our previous findings and conclusions and give a better idea of the variability in vivo that we now discuss in much more detail in the discussion (see paragraph 2).
Recommendations for the Authors:
Reviewer #1 (Recommendations For The Authors):
The lower potency of maxadilan activation might reflect broader implications for ligand-receptor dynamics. Perhaps the authors could discuss the maxadilan binding from a structural perspective, including AlphaFold models. Also, discussing how these findings might influence sensor application in diverse biological contexts would be insightful. Clear definitions and consistent use of these terms are crucial for ensuring that readers understand the methods and results.
We would like to thank the reviewer for the comments. As part of this work, we did not obtain a dose-response curve for maxadilan peptide, and only reported the maximal response of the sensor to a high concentration of the peptide (10 µM). Thus, our findings would rather inform us on the maximal efficacy of the peptide, as opposed to its potency towards the PAC1R. Furthermore, we would like to point out that due to the lack of structural details for any GPCR-based sensor published to date, we cannot make any molecularly accurate conclusion regarding the precise reasons why a different ligand (in this case the sandfly maxadilan) induces a lower maximal efficacy of the response compared to the endogenous cognate ligand of the receptor. We do not believe that AlphaFold models can accurately replace structural information in this regard, especially given the consideration that the aminoacid linker regions between the GPCR and the fluorescent protein, which are a critical determinant of allosteric chromophore modulation by ligand-induced conformational changes, typically obtain the lowest confidence score in all AlphaFold predicted structural models of GPCR-based sensors. Finally, we would like to refer the reviewer to a very nice recent publication (PMID: 32047270) which resolved the structures of each of these peptides bound to the PAC1 receptor-Gs protein complex, which provides accurate molecular details on the different modalities of receptor binding and activation by PACAP138 versus maxadilan.
Reviewer #2 (Recommendations For The Authors):
The authors are congratulated on the meticulous achievement of their aim, i.e. a fluorescence-based sensor for the detection of PACAP with in vivo utility. Whether or not this sensor will have the requisite sensitivity to detect the release of endogenous PACAP within various regions of the nervous system, in response to specific environmental stimuli or changes in brain or physiological state, remains to be determined.
We thank the reviewer for the very positive evaluation of our manuscript and for the suggested additions that will improve the strength of our arguments.
We agree that the in vivo detection of endogenous PACAP will be an important objective for future studies. Due to time, resource and animal license constraints, we are not able to address this objective in our current study, but we now detail possible future experiments in the discussion section. Please see also our answer to the suggested discussion points previously.
Reviewer #3 (Recommendations For The Authors):
To comprehensively assess the sensor's sensitivity and specificity to endogenous PACAP, I recommend conducting additional in vivo experiments where PAClight1P78A is expressed in neurons that endogenously express the Pac1r receptor (using Adcyap1r1-Cre mouse line). These experiments should involve applying sensory or emotional stimuli known to evoke PACAP release or activating upstream PACAP-expressing neurons. Such studies would offer valuable data on the sensor's performance under natural physiological conditions and its potential utility for exploring PACAP's roles in vivo.
We express our gratitude to the reviewer for providing detailed methodological approaches to examine endogenous PACAP release. These suggestions will prove invaluable for future investigations and are important additions to a follow-up publication. As mentioned earlier, we have incorporated some of these approaches into our discussion. Additionally, we have underscored the existing limitations in detecting endogenous PACAP in vivo and emphasized the relevance of PAClight for drug development purposes.
-
eLife assessment
This fundamental paper reports a new biosensor to study G protein-coupled receptor activation by the pituitary adenylyl cyclase-activating polypeptide (PACAP) in cell culture, ex vivo (mouse brain slices), and in vivo (zebrafish, mouse). Convincing data are presented that show the new sensor works with high affinity in vitro, while requiring very high (non-physiological) concentrations of exogenous PACAP when applied to intact tissues. The sensor has not yet been used to detect endogenously released PACAP, raising questions about whether the sensor can be used for its intended purpose. While further work must be pursued to achieve broad in vivo applications under physiological conditions, the new tool will be of interest to cell biologists, especially those studying the large and significant GPCR family.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This useful study describes an antibody-free method to map G-quadruplexes (G4s) in vertebrate cells. While the method might have potential, the current analysis is primarily descriptive and does not add substantial new insights beyond existing data (e.g., PMID:34792172). While the datasets provided might constitute a good starting point for future functional studies, additional data and analyses would be needed to fully support the major conclusions and, at the same time, clarify the advantage of this method over other methods. Specifically, the strength of the evidence for DHX9 interfering with the ability of mESCs to differentiate by regulating directly the stability of either G4s or R-loops is still incomplete.
-
Reviewer #1 (Public Review):
Summary:
Non-B DNA structures such as G4s and R-loops have the potential to impact genome stability, gene transcription, and cell differentiation. This study investigates the distribution of G4s and R-loops in human and mouse cells using some interesting technical modifications of existing Tn5-based approaches. This work confirms that the helicase DHX9 could regulate the formation and/or stability of both structures in mouse embryonic stem cells (mESCs). It also provides evidence that the lack of DHX9 in mESCs interferes with their ability to differentiate.
Strengths:
HepG4-seq, the new antibody-free strategy to map G4s based on the ability of Hemin to act as a peroxidase when complexed to G4s, is interesting. This study also provides more evidence that the distribution pattern of G4s and R-loops might vary substantially from one cell type to another.
Weaknesses:
This study is essentially descriptive and does not provide conclusive evidence that lack of DHX9 does interfere with the ability of mESCs to differentiate by regulating directly the stability of either G4 or R-loops. In the end, it does not substantially improve our understanding of DHX9's mode of action.
There is no in-depth comparison of the newly generated data with existing datasets and no rigorous control was presented to test the specificity of the hemin-G4 interaction (a lot of the hemin-dependent signal seems to occur in the cytoplasm, which is unexpected).
The authors talk about co-occurrence between G4 and R-loops but their data does not actually demonstrate co-occurrence in time. If the same loci could form alternatively either R-loops or G4 and if DHX9 was somehow involved in determining the balance between G4s and R-loops, the authors would probably obtain the same distribution pattern. To manipulate R-loop levels in vivo and test how this affects HEPG4-seq signals would have been helpful.
This study relies exclusively on Tn5-based mapping strategies. This is a problem as global changes in DNA accessibility might strongly skew the results. It is unclear at this stage whether the lack of DHX9, BLM, or WRN has an impact on DNA accessibility, which might underlie the differences that were observed. Moreover, Tn5 cleaves DNA at a nearby accessible site, which might be at an unknown distance away from the site of interest. The spatial accuracy of Tn5-based methods is therefore debatable, which is a problem when trying to demonstrate spatial co-occurrence. Alternative mapping methods would have been helpful.
-
Reviewer #2 (Public Review):
Summary:
In this study, Liu et al. explore the interplay between G-quadruplexes (G4s) and R-loops. The authors developed novel techniques, HepG4-seq and HBD-seq, to capture and map these nucleic acid structures genome-wide in human HEK293 cells and mouse embryonic stem cells (mESCs). They identified dynamic, cell-type-specific distributions of co-localized G4s and R-loops, which predominantly localize at active promoters and enhancers of transcriptionally active genes. Furthermore, they assessed the role of helicase Dhx9 in regulating these structures and their impact on gene expression and cellular functions.
The manuscript provides a detailed catalogue of the genome-wide distribution of G4s and R-loops. However, the conceptual advance and the physiological relevance of the findings are not obvious. Overall, the impact of the work on the field is limited to the utility of the presented methods and datasets.
Strengths:
(1) The development and optimization of HepG4-seq and HBD-seq offer novel methods to map native G4s and R-loops.
(2) The study provides extensive data on the distribution of G4s and R-loops, highlighting their co-localization in human and mouse cells.
(3) The study consolidates the role of Dhx9 in modulating these structures and explores its impact on mESC self-renewal and differentiation.
Weaknesses:
(1) The specificity of the biotinylation process and potential off-target effects are not addressed. The authors should provide more data to validate the specificity of the G4-hemin.
(2) Other methods exploring a catalytic dead RNAseH or the HBD to pull down R-loops have been described before. The superior quality of the presented methods in comparison to existing ones is not established. A clear comparison with other methods (BG4 CUT&Tag-seq, DRIP-seq, R-CHIP, etc) should be provided.
(3) Although the study demonstrates Dhx9's role in regulating co-localized G4s and R-loops, additional functional experiments (e.g., rescue experiments) are needed to confirm these findings.
(4) The manuscript would benefit from a more detailed discussion of the broader implications of co-localized G4s and R-loops.
(5) The manuscript lacks appropriate statistical analyses to support the major conclusions.
(6) The discussion could be expanded to address potential limitations and alternative explanations for the results.
-
Reviewer #3 (Public Review):
Summary:
The authors developed and optimized the methods for detecting G4s and R-loops independent of BG4 and S9.6 antibody, and mapped genomic native G4s and R-loops by HepG4-seq and HBD-seq, revealing that co-localized G4s and R-loops participate in regulating transcription and affecting the self-renewal and differentiation capabilities of mESCs.
Strengths:
By utilizing the peroxidase activity of G4-hemin complex and combining proximity labeling technology, the authors developed HepG4-seq (high throughput sequencing of hemin-induced proximal labelled G4s) , which can detect the dynamics of G4s in vivo. Meanwhile, the "GST-His6-2xHBD"-mediated CUT&Tag protocol (Wang et al., 2021) was optimized by replacing fusion protein and tag, the optimized HBD-seq avoids the generation of GST fusion protein aggregates and can reflect the genome-wide distribution of R-loops in vivo.
The authors employed HepG4-seq and HBD-seq to establish comprehensive maps of native co-localized G4s and R-loops in human HEK293 cells and mouse embryonic stem cells (mESCs). The data indicate that co-localized G4s and R-loops are dynamically altered in a cell type-dependent manner and are largely localized at active promoters and enhancers of transcriptionally active genes.
Combined with Dhx9 ChIP-seq and co-localized G4s and R-loops data in wild-type and dhx9KO mESCs, the authors confirm that the helicase Dhx9 is a direct and major regulator that regulates the formation and resolution of co-localized G4s and R-loops.
Depletion of Dhx9 impaired the self-renewal and differentiation capacities of mESCs by altering the transcription of co-localized G4s and R-loops-associated genes.
In conclusion, the authors provide an approach to studying the interplay between G4s and R-loops, shedding light on the important roles of co-localized G4s and R-loops in development and disease by regulating the transcription of related genes.
Weaknesses:
As we know, there are at least two structure data of S9.6 antibody very recently, and the questions about the specificity of the S9.6 antibody on RNA:DNA hybrids should be finished. The authors referred to (Hartono et al., 2018; Konig et al., 2017; Phillips et al., 2013) need to be updated, and the authors' bias against S9.6 antibodies needs also to be changed. However, as the authors had questioned the specificity of the S9.6 antibody, they should compare it in parallel with the data they have and the data generated by the widely used S9.6 antibody.
Although HepG4-seq is an effective G4s detection technique, and the authors have also verified its reliability to some extent, given the strong link between ROS homeostasis and G4s formation, and hemin's affinity for different types of G4s, whether HepG4-seq reflects the dynamics of G4s in vivo more accurately than existing detection techniques still needs to be more carefully corroborated.
-
Author response:
eLife assessment
This useful study describes an antibody-free method to map G-quadruplexes (G4s) in vertebrate cells. While the method might have potential, the current analysis is primarily descriptive and does not add substantial new insights beyond existing data (e.g., PMID:34792172). While the datasets provided might constitute a good starting point for future functional studies, additional data and analyses would be needed to fully support the major conclusions and, at the same time, clarify the advantage of this method over other methods. Specifically, the strength of the evidence for DHX9 interfering with the ability of mESCs to differentiate by regulating directly the stability of either G4s or R-loops is still incomplete.
We thank the editors for their helpful comments.
Given that antibody-based methods have been reported to leave open the possibility of recognizing partially folded G4s and promoting their folding, we have employed the peroxidase activity of the G4-hemin complex to develop a new method for capturing endogenous G4s that significantly reduces the risk of capturing partially folded G4s. We will be happy to clarify the advantage of our method.
In the Fig. 7, we applied the Dhx9 CUT&Tag assay to identify the G4s and R-loops directly bound by Dhx9 and further characterized the differential Dhx9-bound G4s and R-loops in the absence of Dhx9. Dhx9 is a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). Furthermore, we showed that depletion of Dhx9 significantly altered the levels of G4s or R-loops around the TSS or gene bodies of several key regulators of mESC and embryonic development, such as Nanog, Lin28a, Bmp4, Wnt8a, Gata2, and Lef1, and also their RNA levels (Fig.7 I). The above evidence is sufficient to support the transcriptional regulation of mESCs cell fate by directly modulating the G4s or R-loops within the key regulators of mESCs.
Public Reviews:
Reviewer #1 (Public Review):
Summary:
Non-B DNA structures such as G4s and R-loops have the potential to impact genome stability, gene transcription, and cell differentiation. This study investigates the distribution of G4s and R-loops in human and mouse cells using some interesting technical modifications of existing Tn5-based approaches. This work confirms that the helicase DHX9 could regulate the formation and/or stability of both structures in mouse embryonic stem cells (mESCs). It also provides evidence that the lack of DHX9 in mESCs interferes with their ability to differentiate.
Strengths:
HepG4-seq, the new antibody-free strategy to map G4s based on the ability of Hemin to act as a peroxidase when complexed to G4s, is interesting. This study also provides more evidence that the distribution pattern of G4s and R-loops might vary substantially from one cell type to another.
We appreciate your valuable points.
Weaknesses:
This study is essentially descriptive and does not provide conclusive evidence that lack of DHX9 does interfere with the ability of mESCs to differentiate by regulating directly the stability of either G4 or R-loops. In the end, it does not substantially improve our understanding of DHX9's mode of action.
In this study, we aimed to report new methods for capturing endogenous G4s and R-loops in living cells. Dhx9 has been reported to directly unwind R-loops and G4s or promote R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). To understand the direct Dhx9-bound G4s and R-loops, we performed the Dhx9 CUT&Tag assay and analyzed the co-localization of Dhx9-binding sites and G4s or R-loops. We found that 47,857 co-localized G4s and R-loops are directly bound by Dhx9 in the wild-type mESCs and 4,060 of them display significantly differential signals in absence of Dhx9, suggesting that redundant regulators exist as well. We showed that depletion of Dhx9 significantly altered the RNA levels of several key regulators of mESC and embryonic development, such as Nanog, Lin28a, Bmp4, Wnt8a, Gata2, and Lef1, which coincides with the significantly differential levels of G4s or R-loops around the TSS or gene bodies of these genes (Fig.7). The comprehensive molecular mechanism of Dhx9 action is indeed not the focus of this study. We will work on it in the future studies. Thank you for the comments.
There is no in-depth comparison of the newly generated data with existing datasets and no rigorous control was presented to test the specificity of the hemin-G4 interaction (a lot of the hemin-dependent signal seems to occur in the cytoplasm, which is unexpected).
The specificity of hemin-G4-induced peroxidase activity and self-biotinylation has been well demonstrated in previous studies (PMID: 19618960, 22106035, 28973477, 32329781). In the Fig.1A, we compared the hemin-G4-induced biotinylation levels in different conditions. Cells treated with hemin and Bio-An exhibited a robust fluorescence signal, while the absence of either hemin or Bio-An almost completely abolished the biotinylation signals, suggesting a specific and active biotinylation activity. To identify the specific signals, we have included the non-label control and used this control to call confident HepG4 peaks in all HepG4-seq assays.
The hemin-RNA G4 complex has also been reported to have mimic peroxidase activity and trigger similar self-biotinylation signals as DNA G4s (PMID: 32329781, 31257395, 27422869). Therefore, it is not surprising to observe hemin-dependent signals in the cytoplasm generated by cytoplasmic RNA G4s.
In the revised version, we will include careful comparison between our data and previous datasets.
The authors talk about co-occurrence between G4 and R-loops but their data does not actually demonstrate co-occurrence in time. If the same loci could form alternatively either R-loops or G4 and if DHX9 was somehow involved in determining the balance between G4s and R-loops, the authors would probably obtain the same distribution pattern. To manipulate R-loop levels in vivo and test how this affects HEPG4-seq signals would have been helpful.
Single-molecule fluorescence studies have shown the existence of a positive feedback mechanism of G4 and R-loop formation during transcription (PMID: 32810236, 32636376), suggesting that G4s and Rloops could co-localize at the same molecule. Dhx9 is a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). Although depletion of Dhx9 resulted in 6,171 Dhx9-bound co-localized G4s and R-loops with significantly altered levels of G4s or R-loops, only 276 of them (~4.5%) harbored altered G4s and R-loops, suggesting that the interacting G4s and R-loops are rare in living cells. Nowadays, the genome-wide co-occurrence of two factors are mainly obtained by bioinformatically intersection analysis. We agreed that the heterogenous distribution between cells will give false positive co-occurrence patterns. We will carefully discuss this point in the revised version. At the same time, we will make efforts to develop a new method to map the co-localized G4 and R-loop in the same molecule in the future study.
This study relies exclusively on Tn5-based mapping strategies. This is a problem as global changes in DNA accessibility might strongly skew the results. It is unclear at this stage whether the lack of DHX9, BLM, or WRN has an impact on DNA accessibility, which might underlie the differences that were observed. Moreover, Tn5 cleaves DNA at a nearby accessible site, which might be at an unknown distance away from the site of interest. The spatial accuracy of Tn5-based methods is therefore debatable, which is a problem when trying to demonstrate spatial co-occurrence. Alternative mapping methods would have been helpful.
In this study, we used the recombinant streptavidin monomer and anti-GP41 nanobody fusion protein (mSA-scFv) to specifically recognize hemin-G4-induced biotinylated G4 and then recruit the recombinant GP41-tagged Tn5 protein to these G4s sites. Similarly, the recombinant V5-tagged N-terminal hybrid-binding domain (HBD) of RNase H1 specifically recognizes R-loops and recruit the recombinant protein G-Tn5 (pG-Tn5) with the help of anti-V5 antibody. Therefore, the spatial distance of Tn5 to the target sites is well controlled and very short, and also the recruitment of Tn5 is specifically determined by the existence of G4s in HepG4-seq and R-loops in HBD-seq.
Reviewer #2 (Public Review):
Summary:
In this study, Liu et al. explore the interplay between G-quadruplexes (G4s) and R-loops. The authors developed novel techniques, HepG4-seq and HBD-seq, to capture and map these nucleic acid structures genome-wide in human HEK293 cells and mouse embryonic stem cells (mESCs). They identified dynamic, cell-type-specific distributions of co-localized G4s and R-loops, which predominantly localize at active promoters and enhancers of transcriptionally active genes. Furthermore, they assessed the role of helicase Dhx9 in regulating these structures and their impact on gene expression and cellular functions.
The manuscript provides a detailed catalogue of the genome-wide distribution of G4s and R-loops. However, the conceptual advance and the physiological relevance of the findings are not obvious. Overall, the impact of the work on the field is limited to the utility of the presented methods and datasets.
Strengths:
(1) The development and optimization of HepG4-seq and HBD-seq offer novel methods to map native G4s and R-loops.
(2) The study provides extensive data on the distribution of G4s and R-loops, highlighting their co-localization in human and mouse cells.
(3) The study consolidates the role of Dhx9 in modulating these structures and explores its impact on mESC self-renewal and differentiation.
We appreciate your valuable points.
Weaknesses:
(1) The specificity of the biotinylation process and potential off-target effects are not addressed. The authors should provide more data to validate the specificity of the G4-hemin.
The specificity of hemin-G4-induced peroxidase activity and self-biotinylation has been well demonstrated in previous studies (PMID: 19618960, 22106035, 28973477, 32329781). In the Fig.1A, we compared the hemin-G4-induced biotinylation levels in different conditions. Cells treated with hemin and Bio-An exhibited a robust fluorescence signal, while the absence of either hemin or Bio-An almost completely abolished the biotinylation signals, suggesting a specific and active biotinylation activity.
(2) Other methods exploring a catalytic dead RNAseH or the HBD to pull down R-loops have been described before. The superior quality of the presented methods in comparison to existing ones is not established. A clear comparison with other methods (BG4 CUT&Tag-seq, DRIP-seq, R-CHIP, etc) should be provided.
Thank you for the suggestions. We will include the comparisons in the revised version.
(3) Although the study demonstrates Dhx9's role in regulating co-localized G4s and R-loops, additional functional experiments (e.g., rescue experiments) are needed to confirm these findings.
Dhx9 has been demonstrate as a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation in previous studies (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). We believe that the current new dataset and previous studies are enough to support the capability of Dhx9 in regulating co-localized G4s and R-loops.
(4) The manuscript would benefit from a more detailed discussion of the broader implications of co-localized G4s and R-loops.
Thank you for the suggestions. We will include a more detailed discussion in the revised version.
(5) The manuscript lacks appropriate statistical analyses to support the major conclusions.
We apologized for this point. Whereas we have applied careful statistical analyses in this study, lacking of some statistical details make people hard to understand some conclusions. We will carefully add details of all statistical analysis.
(6) The discussion could be expanded to address potential limitations and alternative explanations for the results.
Thank you for the suggestions. We will include a more detailed discussion about this point in the revised version.
Reviewer #3 (Public Review):
Summary:
The authors developed and optimized the methods for detecting G4s and R-loops independent of BG4 and S9.6 antibody, and mapped genomic native G4s and R-loops by HepG4-seq and HBD-seq, revealing that co-localized G4s and R-loops participate in regulating transcription and affecting the self-renewal and differentiation capabilities of mESCs.
Strengths:
By utilizing the peroxidase activity of G4-hemin complex and combining proximity labeling technology, the authors developed HepG4-seq (high throughput sequencing of hemin-induced proximal labelled G4s), which can detect the dynamics of G4s in vivo. Meanwhile, the "GST-His6-2xHBD"-mediated CUT&Tag protocol (Wang et al., 2021) was optimized by replacing fusion protein and tag, the optimized HBD-seq avoids the generation of GST fusion protein aggregates and can reflect the genome-wide distribution of R-loops in vivo.
The authors employed HepG4-seq and HBD-seq to establish comprehensive maps of native co-localized G4s and R-loops in human HEK293 cells and mouse embryonic stem cells (mESCs). The data indicate that co-localized G4s and R-loops are dynamically altered in a cell type-dependent manner and are largely localized at active promoters and enhancers of transcriptionally active genes.
Combined with Dhx9 ChIP-seq and co-localized G4s and R-loops data in wild-type and dhx9KO mESCs, the authors confirm that the helicase Dhx9 is a direct and major regulator that regulates the formation and resolution of co-localized G4s and R-loops.
Depletion of Dhx9 impaired the self-renewal and differentiation capacities of mESCs by altering the transcription of co-localized G4s and R-loops-associated genes.
In conclusion, the authors provide an approach to studying the interplay between G4s and R-loops, shedding light on the important roles of co-localized G4s and R-loops in development and disease by regulating the transcription of related genes.
We appreciate your valuable points.
Weaknesses:
As we know, there are at least two structure data of S9.6 antibody very recently, and the questions about the specificity of the S9.6 antibody on RNA:DNA hybrids should be finished. The authors referred to (Hartono et al., 2018; Konig et al., 2017; Phillips et al., 2013) need to be updated, and the authors' bias against S9.6 antibodies needs also to be changed. However, as the authors had questioned the specificity of the S9.6 antibody, they should compare it in parallel with the data they have and the data generated by the widely used S9.6 antibody.
Thank you for the updating information about the structure data of S9.6 antibody. We politely disagree the specificity of the S9.6 antibody on RNA:DNA hybrids. The structural studies of S9.6 (PMID: 35347133, 35550870) used only one RNA:DNA hybrid to show the superior specificity of S9.6 on RNA:DNA hybrid than dsRNA and dsDNA. However, Fabian K. et al has reported that the binding affinities of S9.6 on RNA:DNA hybrid exhibits obvious sequence-dependent bias from null to nanomolar range (PMID: 28594954). We will include the comparison between S9.6-derived data and our HBD-seq data in the revised version.
Although HepG4-seq is an effective G4s detection technique, and the authors have also verified its reliability to some extent, given the strong link between ROS homeostasis and G4s formation, and hemin's affinity for different types of G4s, whether HepG4-seq reflects the dynamics of G4s in vivo more accurately than existing detection techniques still needs to be more carefully corroborated.
Thank you for pointing out this issue. In the in vitro hemin-G4 induced self-biotinylation assay, parallel G4s exhibit higher peroxidase activities than anti-parallel G4s. Thus, the dynamics of G4 conformation could affect the HepG4-seq signals (PMID: 32329781). In the future, people may need to combine HepG4-seq and BG4s-eq to carefully explain the endogenous G4s. We will carefully discuss this point in the revised version.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This valuable paper compares blood gene signature responses between small cohorts of individuals with mild and severe COVID-19. The authors provide solid evidence for distinct transcriptional profiles during early COVID-19 infections that may be predictive of severity, within the limitations of studying human patients displaying heterogeneity in infection timelines and limited cohort size.
-
Reviewer #1 (Public Review):
Summary:
Medina et al, 2023 investigated the peripheral blood transcriptional responses in patients with diversifying disease outcomes. The authors characterized the blood transcriptome of four non-hospitalized individuals presenting mild disease and four patients hospitalized with severe disease. These individuals were observed longitudinally at three timepoints (0-, 7-, and 28-days post recruitment), and distinct transcriptional responses were observed between severe hospitalized patients and mild non-hospitalized individuals, especially during 0- and 7-day collection timepoints. Particularly, the authors found that increased expression of genes associated with NK cell cytotoxicity is associated with mild outcomes. Additional co-regulated gene network analyses positively correlates T cell activity with mild disease and neutrophil degranulation with severe disease.
Strengths:
The longitudinal measurements in individual participants at consistent collection intervals can offer an added dimension to the dataset that involves temporal trajectories of genes associated with disease outcomes and is a key strength of the study. The use of co-expressed gene networks specific to the cohort to complement enrichment results obtained from pre-determined gene sets can offer valuable insights into new associations/networks associated with disease progression and warrants further analyses on the biological functions enriched within these co-expressed network modules.
Weaknesses:
There is a large difference in the infection timeline (onset of symptom to recruitment) between mild and severe patient cohort. As immune responses during early infection can be highly dynamic, the differences in infection timeline may bias transcriptional signatures observed between the groups. The study is also limited by a small cohort size.
Comments on revised version:
The authors have addressed the specific concerns brought forth by the reviewers.
-
Reviewer #2 (Public Review):
In their manuscript, Medina and colleagues investigate transcriptional differences between mild and severe SARS-CoV-2 infections. Their analyses are very comprehensive incorporating a multitude of bioinformatics tools ranging from PCA plots, GSEA and DEG analysis, protein-protein interaction network, and weighted correlation network analyses. They conclude that in mild COVID-19 infection NK cell functionality is compromised and this is connected to cytokine interactions and Th1/Th2 cell differentiation pathways cross-talk, bridging the innate and the adaptive arms of the immune system. The authors successfully recruited participants with both mild and severe COVID-19 between November 2020 to May 2021. The analyzed cohort is gender and acceptably age-matched and the results reported are promising. Signatures associated with NK cell cytotoxicity in mild and neutrophil functions in the severe group during acute infection are the chief findings reported in this manuscript.
Comments on revised version:
The authors responded appropriately to the previous review critiques.
-
Author response:
The following is the authors’ response to the original reviews.
Reviewer #1 (Recommendations For The Authors):
(1) Due to the significant difference between the infection timeline of mild (1 day post symptom onset) and severe (10 days post symptom onset) cohort at enrollment, an informative analysis to consider is to compare timepoint 2 from the mild cohort to timepoint 1 from the severe cohort.
In agreement with what the reviewer noted on his comment, to be more helpful we completed the analysis comparing timepoint 2 from the mild cohort to timepoint 1 from severe cohort, which is now included as Figure 4-figure supplement 5. The new text added is on pages 13-14, lines 346-355 explaining this analysis. We also included a paragraph in the discussion on page 22, lines 595-604. We have resolved to show this comparison to enforce the main observation related to Natural Killer Cytotoxicity pathways enriched in all analyses of this work.
(2) Alternatively, as this information is available, the authors may group the samples based on the individual's infection timeline as opposed to the recruitment timeline.
Patients in both groups were enrolled at the peak of their symptoms. According to this criterion, we grouped the patients to generate more significant results. Since these infections occurred naturally, we have no accurate information regarding the infection timing of patients. However, if the samples were grouped in order of individual infection timeline, the analysis would be statistically weak to make conclusions about the course of COVID-19, as disease progression would not be coordinated. Our grouping approach provided us a good confidence range, despite the tiny population evaluated.
(3) The authors selected three co-regulated network modules based on the size of module membership genes, selecting the three modules containing the largest gene membership. Small co-regulated networks can also offer important biological insights into specific molecular machinery associated with disease outcomes.
Figure 5 was updated including two more networks (besides blue), for brown and turquoise modules (5E and 5F). This new information allowed us to understand deeply the three larger modules with the most significant results, due to the number of genes they included (blue: 704, brown: 508, and turquoise: 712). The new text describing this analysis is included in page 15 lines 388-396. The remaining 7 modules were also analyzed, and the Gene Ontology/Pathways enrichment were included in 2 new supplemental figures (Figure 5 - figure supplement 1 and 2). The new text describing this analysis is included on page 15, lines 397-401.
(4) An alternative selection criterion that can inform biological associations between module genes and disease severity is the strength of the correlation coefficients. It seems from Figure 5B, that yellow, turquoise, and green modules have a moderate positive correlation with severe patients, while brown, blue, and gray modules show a slight positive correlation with mild outpatients. A recommendation for the authors is to consider revising Figure 5C to include the enrichment of these additional modules and include these modules in the interpretation of the results.
The correlations between cohorts and the modules (blue, brown and turquoise) are clearly identified for severe or mild patients. However, for several smaller modules, correlations are heterogenous for different patients of the cohorts, making it hard to gain a clear conclusion related to severity groups. In this sense, the 7 modules were analyzed as is indicated in the previous response number #3, and the results offer an idea of the different transcriptional programs present at different patients in different stages of disease. However, the small number of genes in some modules brings weak results of GO and enriched pathways, making it difficult to interpretation. The text describing this figure is included in page 15 lines 397-401. Also, the network analyses for brown and turquoise modules were included in figure 5 as 5E-F and the text detailing these figures was included on page 15 lines 388-396.
(5) In Figures 3E and 3F, the authors present enrichment analyses of differentially expressed genes from day 28. However, earlier in the results (lines 226-228), the authors reported no differentially expressed genes observed between the mild and severe participant cohort at this time point. Can the authors clarify which comparison was performed to obtain the list of differentially expressed genes used in the enrichment analyses in Figures 3E and 3F?
The discrepancy in this case stems from separate criteria employed for comparison in each case. At the pairwise comparison, DEGs list is different from the longitudinal comparison mentioned afterwards, as for this later analysis we selected only the genes with different trajectories throughout the study (Figure 3). To clarify this point, we included a new paragraph on page 11, lines 278-285.
Original:
“We detected 828 genes that exhibited temporal and quantitative expression level differences during the progression of disease. We discovered additional biological processes and KEGG pathways that were differentially enriched during the COVID-19 progression in mild and severe patients (Figure 3) using the Enrichr platform (G. Chen et al., 2020)”
Changed to:
“To do so, we first identified genes that were differentially expressed between severity groups, and second, we chose only those that also showed changes in their trajectories across sampling times. In doing so, we found 828 genes that exhibited temporal differences in expression level during disease progression. Then using the Enrichr platform (G. Chen et al., 2020), we discovered additional biological processes and KEGG pathways that were differentially enriched during the COVID-19 progression in mild and severe patients (Figure 3).”
(6) Additionally, the authors refer to specific enriched genes in Figure 3 (lines 298-302), but Figure 3 only displays the enriched terms. Can the authors include the results from the enrichment analysis that include gene membership for each enriched term in the supplement?
Certainly, there is no figure or table in the initial version that includes the gene list for this analysis. We have now included a supplement table 1 and 2 that details each pathway, along with its gene list.
(7) In line 104, can the authors clarify the parameters used to define well-matched samples?
Based on the observations made by the reviewers, we decided to change the wording to make it more obvious about the message of this paper. The update was included on page 5, line as follows:
Original:
“Here, we designed a longitudinal investigation using well-matched samples to study how changes in gene expression in distinct immune effector cells changed during the earliest time points after diagnosis and during progression of clinical disease”,
Changed to:
“Here, we designed a longitudinal comparison between mild and severe patients, choosing the appropriate samples according to the clinical progression and the unbiased gene expression profile”
(8) In lines 113-116, can the authors clarify how their approach mitigates noise/potential biases and very briefly, describe what the nature of noise/biases could be?
The main goal of this paragraph is to show that, while there are several pathways with statistical significance in our analyses, the focus was on NK cell cytotoxicity because this molecular pathway showed bridges between other relevant immune responses; thus, the pathways chosen to respond to its intricated transcriptional program instead of a biased interest. The text was edited and included on page 6, line 111-131 as follows:
Original:
“We used a pairwise comparison of gene expression, gene set enrichment, and weight-correlated gene network analyses to detect differential expression of genes involved with the cytotoxic signaling pathway of Natural Killer (NK) cells in mild verses severe progression of disease. We promoted a broad and integrated point of view throughout the transcriptomic analysis of functional pathways to mitigate noise and potential biases (Bastard et al., 2020; Delorey et al., 2021; Schultze & Aschenbrenner, 2021; S. Zhang et al., 2022). We found close connectivity between NK signaling pathway genes and those of cytokine-cytokine receptor signaling pathways, along with Th1/Th2 cell differentiation genes, as part of the transcriptional circuit executed preferentially among mildly ill patients. Our results detected transcriptional circuits engaging multiple regulatory checkpoints. These findings indicated that the innate NK signaling pathway (cell cytotoxic activity) is beneficial, perhaps a critically-necessary activity needed to effectively eradicate coronavirus. We interpreted that an adaptive immune response that included early cell-mediated immunity was important for reducing disease severity in mild patients. This balance between humoral- and cell-mediated immunity appeared to be less robust in patients presenting with severe COVID-19. These results detected components of the immune response that were significantly associated with the differences in symptom severity observed between mild and severely ill COVID-19 patients.”
Changed to:
“Briefly, to gain more insights into our findings and complement their functional context, we used a pairwise comparison of gene expression, gene set enrichment, and weight-correlated gene network analyses. By doing so, we identified pathways of genes involved with the NK cell cytotoxicity enriched in mild patients when compared to severe. Besides focusing on a particular molecular pathway, we investigated the interactions to better comprehend the underlying phenomena of a successful immune response, contributing to an integrated point of view throughout the transcriptomic analyses of functional pathways to mitigate potential biases attributed to focusing the study on a single pathway. In this regard, we revealed that the NK signaling pathway was intricately related to other transcriptional circuits, such as those governing Th1/Th2 cell differentiation and cytokine-cytokine receptor signaling pathways. These interactions highlight the importance of these pathways as bridges between the innate and adaptive immune responses throughout the disease, implying that the innate NK signaling pathway (cell cytotoxic activity) is beneficial, and possibly a critical activity required to effectively eradicate coronavirus. We also concluded that an adaptive immune response including early cell-mediated immunity was significant in lowering disease severity. The link between the primary innate NK cell activity and the transcriptional priming of adaptive Th1 and Th2 cell responses appears to be more robust in mild patients than in severe.”
(9) In line 120, can the authors clarify which regulatory checkpoints were being referred to?
The concept of “checkpoint” was changed to “bridges” (line 124), because offers a clearer idea about the molecular interaction displayed across the different enriched pathways described in our study. In this sense, the bridges show the connection between innate immune response by NK cell and the adaptive immune response by Th1/Th2 cells
(10) In lines 125-126, can the authors refer to specific results to support this observation?
Lines 111 to 129 summarize the results of the analysis that support the aforementioned phrase. However, the original sentence referred was modified for better comprehension on page 6, lines 129-131 as follows:
Original:
“This balance between humoral- and cell-mediated immunity appeared to be less robust in patients presenting with severe COVID-19”
Changed to:
“The link between the primary innate NK cell activity and the transcriptional priming of adaptive Th1 and Th2 cell responses appears to be more robust in mild patients than in severe.”
(11) In lines 184-185, can the authors clarify what the term "mixed" specifically refers to?
The original text was modified for better comprehension on page 8, lines 177-179 as follows:
Original:
“Interestingly, on day-28, when the majority of patients had recovered, samples from severely ill patients were still mixed compared to those with mild symptoms.”
Changed to:
“Interestingly, on day-28, when the majority of patients had recovered, samples from severely ill patients were pooled together with those mild patients who had already recovered”.
(12) In line 286, can the authors clarify how quantitative expression level differences are distinct from temporal expression level differences?
Despite the differences in the enrollment time between mild and severe cohorts, it was made precisely during COVID-19 symptoms peaks, as illustrated in figure 1B. Also supporting this criterion, the longitudinal analysis outlined in figure 3 was performed taking into account the changes in gene expression trajectories along all sampling times. This point has significance because the results obtained from it exposed several transcriptional programs that were dynamically executed along disease progression, even independently of the pairwise comparison approaches carried out previously.
(13) In Figure 1C, there seemed to be two data points associated with "M1 0 days" and "M4 28 days" with distinct PC projections. Could these samples be mislabeled?
The figure was revised and completed. The hexagon symbol for day-28 was changed for a star symbol. The “M1 0 days” and “M4 28 days” samples were labeled correctly. See below figure 1C with changes as follows:
(14) In Figure 1D caption: could authors clarify if the ranking of 100 genes was based on the log2FC or adjusted p-values?
The criteria considered was Fold Change ≥ 2 and the FDR ≤ 0.05 which is included in the methodology on page 23, lines 657-660
(15) In Figure 4D, can the authors include the expression z score for the healthy participants?
We could include this information, but we consider that it would not help for the understanding of this figure because in this way we put the focus on the differential trajectories between mild and severe patients. Also, DEGs from mild and severe cohorts from this analysis or any other in this work were obtained relatively to healthy donors.
(16) Related to this, can the authors clarify if the expression z scores were computed using the mean and standard deviations of all samples within the study or relative to a specific participant cohort?
The z-score was used considering the mild and severe patients to calculate mean and then the standard deviation of each group. A new paragraph was included in material and methods on page 24, lines 662-664.
(17) In Figure 5B, can the authors include column annotations for participants and sampling time points?
The figure 5B was updated and completed with the suggested information.
(18) In Figure 1 - Figure Supplement 2, can the authors include the volcano plot from the pairwise comparison for day 28 showing no differentially expressed genes between mild and severe participants as reported in the results (lines 226-228)?
The third volcano plot for day 28 was included in the updated figure 1 supplement 2.
Reviewer #2 (Recommendations For The Authors):
The manuscript is generally very well-constructed and well-written. However, the following are the major concerns mostly regarding the study design and participant selection.
(1) The authors have used enrolment day as D0 which is not reflective of the immune response timeline. Especially when the designated 'D0' for the severe group is 10.0 + 1.8 days post symptom (DPS) onset while the 'D0' for the mild group is 1.2 + 1.3 DPS. In the context of an acute infection discussed herewith, this difference is critical.
As tempting as it is to conduct longitudinal studies on COVID-19, the authors might do better focusing on specific acute time points (within 10 days post-symptom onset) and convalescent time points (beyond 28 days post-symptom). A better comparison would be D0 severe with D7 mild (aligning the DPS to be between 7-10 days in both groups).
Despite the differences in the enrolment time between mild and severe cohorts, it was made precisely during COVID-19 symptoms peaks, as illustrated in figure 1B. Also supporting this criterion, the longitudinal analysis outlined in figure 3 was performed taking into account the changes in gene expression trajectories along all sampling times. This point has significance because the results obtained from it exposed several transcriptional programs that were dynamically executed along disease progression, even independently of the pairwise comparison approaches carried out previously. Likewise, we agree with the observation of the reviewer, because as we mentioned in the article, it is difficult to properly compare disease progression between naturally infected patients. So, to better support our findings, we complemented them throughout a pairwise comparison between day-7 samples from mild and day-0 samples from severely ill individuals, finding GO terms and enriched pathways related to NK cell function across the mild cohort, as seen in Figure 4-figure supplement 5. This result enforced the main findings gained from the different analyses carried out in this work, highlighting the relevance of the innate immune response of Natural Killer cells, which correlated with a mild progression of disease. The new paragraph describing this analysis was included in pages 13-14, lines 346-355. We also included a paragraph in the discussion on page 22, lines 595-604.
(2) Though there are four participants within each group, one of the participants with severe infection (S1) only has the D0 time point which probably undermines the statistical significance of the results.
This is an accurate observation, as the statistical weight will allow the deeper alterations to be evaluated while the more subtle ones will most likely be excluded from this study. In our analyses, we focused on variations with high statistical significance, which led to the discovery of a distinct Natural Killer response between mild and severe cohorts.
(3) The authors should also account for any medications administered to the severe group in the ICU before enrolment in the study -immune-dampening drugs or steroids which may alter neutrophil recruitment or other immune functions.
Only one severe patient received medication both prior to and during the COVID-19 disease. Even though several medications were administered to this patient, their effects have not been found to increase the neutrophil response.
(4) What was the viral load status at the different time points analyzed - how does this relate to the immune and clinical findings?
In this recruitment the viral load status was not measured.
(5) Was any complete blood count or basic immune phenotyping conducted on these samples? Important to know the various cell frequencies in the PBMC mix sent for sequencing to account for contamination of lymphocytes with RBCs/monocytes/neutrophils as well as any lymphopenia.
This measurement was not done for these samples. However, our protocol of PBMC purification has been tested before and showed small quantities of red blood cell contamination in the process. Furthermore, in all analysis of Gene Ontology or Enriched Pathways, there is not any related to red blood cell genes that could generate noise in the interpretation of our results.
(6) The neutrophil/lymphocyte ratio is already skewed during SARS-CoV-2 infection - which could be the reason for higher readings in severe participants? - speculate?
Effectively, the ratio in several cell types is changed during SARS-CoV-2 infection. However, despite this noise in the proportion of immune cells, different functions in our study are more represented in cells with less count as Natural Killer cells. The modules of co-expression analysis support the notion that despite the number of cells being in different proportions, a transcriptional program is being executed differentially in the cohorts.
(7) CD247/ZAP70 also influences the CD16-mediated NK cell ADCC activity which the authors can add to the innate-adaptive bridging section.
NK CD16a is more highly expressed in NK cells. The circuit involving CD247/ZAP70 and CD16 could explain the cytotoxicity of these cells and how they contribute to the establishment of a response to fight the viral infection of SARS-CoV-2. In our study, CD16a (FcgammaRIIIa) expression was similar in both mild and severe cohorts. Because our methodology only counts transcriptional changes, genes that did not change were excluded from our discussion. However, our group's research focuses on this node or bridge between innate and adaptive immune responses, with a particular emphasis on fc-antibodies functions, being a topic of interest for future research.
(8) Some of the figures lacked clarity making it difficult to review. (Eg. Fig 4 A, Fig 4 - supplement 1 A&B, Fig 5).
Figure 4A was redesigned, Figure 4-figure supplement 1 was presented in a full page for better resolution.
Specific Comments:
(1) Consider changing "covid-19" in the title of the manuscript to "COVID-19"
Probably the journal platform changes the letters. The original title is in capital letters according to the observation. In the clinical table “COVID-19” was changed to capital letters.
(2) Page 2: Line 24 - Consider revising this line. Not sure what the authors mean by 'early compromise'
The paragraph was revised and rewritten.
Original:
“Mild COVID-19 patients presented an early compromise with NK cell function, whereas severe patients do so with neutrophil function”
Changed to:
”Mild COVID-19 patients displayed an early transcriptional commitment with NK cell function, whereas severe patients do so with neutrophil function”
(3) Page 4: Lines 57 & 58 - Verify the reference. The paper referenced was published in 2016 and is in regard to SARS-CoV, MERS-CoV, and enterovirus D68.
Effectively, this reference was appropriate for drawing parallels with other respiratory viruses. Due to the emphasis on SARS-CoV-2, the paragraph has been strengthened with two additional references: Shen 2023, and Wauters 2022.
(4) Page 10: Lines 229 - 234 - Consider referring to the appropriate figure (i.e., Figure Supplement 2 A or B). The figure associated with D28 DEGs (Volcano plot) is missing in the supplementary. Erroneously referred here as Figure 1C which is a PCA plot?
The original text was changed because the figure referenced was correct but misunderstood. The final sentence is on page 9, lines 220-223.
(5) Page 10: Line 224 - Change the sentence to " We found upregulated.." instead of " We found regulated..".
The text was edited in accordance with this recommendation, which is currently found in line 232.
(6) Page 13: Line 326 - Figure 4A referenced here is not clear - unable to review.
Figure 4A was updated for a better resolution and included in the manuscript.
(7) Page 15: Line 398 - Consider rewording "after diagnosis" since the days here are "after enrolment".
This recommendation was considered and the text was rewritten on page 15, lines 404-406:
Original:
“We systematically analyzed transcriptomic features of PBMCs from COVID-19 patients with mild and severe symptoms at three sequential time-points (D0, D7, and D28) after diagnosis”
Changed to:
“We systematically analyzed transcriptomic features of PBMCs from COVID-19 patients with mild and severe symptoms at three sequential time-points (D0, D7, and D28) during the peak of the symptoms”
(8) Page 17: Move text from the next page to eliminate blank space.
Resolved
(9) Page 32: Figure 1C - Consider changing the symbol for D28 since it looks very similar to the D0 symbol. Use the colors consistently instead of different shades for each group.
The hexagon symbol was changed by a star symbol for D28 in figure 1C. In this figure each color indicates the three different groups, and the transparent color was used to differentiate the symbols when are close together.
(10) Page 36: Figure 4A - Unable to review.
This figure was resized for better resolution.
(11) Page 42-49: Consider relabeling and renumbering the Supplementary figures for consistency and reference the modified numbers in the appropriate location in the main text.
The supplementary figures were relabeling for consistency and better understanding.
(12) Pages 44 & 48: Unable to review the figures.
The figures indicated were resized for better resolution.
Examples of consistency review:
(1) Use of D0,D7 / D-0, D-7 throughout the manuscript
The selected format for the final version of the manuscript is D0, D7, and D28.
(2) Reporting the source of reagents consistently (Name, Place, Country, Catalog number)
The source reagents were reformatted for consistency in lines 626-628-632-642.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
The authors performed extensive coarse-grained molecular dynamics simulations of 140 different prion-like domain variants to interrogate how specific amino acid substitutions determine the driving forces for phase separation. The analyses are solid, and the predictive scaling laws can aid in identifying potential phase-separating regions in uncharacterized proteins. Overall, this is a valuable contribution to the field of biomolecular condensates. It exemplifies how data-driven methodologies can uncover new insights into complex biological phenomena.
-
Reviewer #1 (Public Review):
Summary:
In this preprint, the authors systematically and rigorously investigate how specific classes of residue mutations alter the critical temperature as a proxy for the driving forces for phase separation. The work is well executed, the manuscript well-written, and the results reasonable and insightful.
Strengths:
The introductory material does an excellent job of being precise in language and ideas while summarizing the state of the art. The simulation design, execution, and analysis are exceptional and set the standard for these types of large-scale simulation studies. The results, interpretations, and Discussion are largely nuanced, clear, and well-motivated.
Weaknesses:
This is not exactly a weakness, but I think it would future-proof the authors' conclusions to clarify a few key caveats associated with this work. Most notably, given the underlying implementation of the Mpipi model, temperature dependencies for intermolecular interactions driven by solvent effects (e.g., hydrophobic effect and charge-mediated interactions facilitated by desolvation penalties) are not captured. This itself is not a "weakness" per se, but it means I would imagine CERTAIN types of features would not be well-captured; notably, my expectation is that at higher temperatures, proline-rich sequences drive intermolecular interactions, but at lower temperatures, they do not. This is likely also true for the aliphatic residues, although these are found less frequently in IDRs. As such, it may be worth the authors explicitly discussing.
Similarly, prior work has established the importance of an alpha-helical region in TDP-43, as well as the role of aliphatic residues in driving TDP-43's assembly (see Schmidt et al 2019). I recognize the authors have focussed here on a specific set of mutations, so it may be worth (in the Discussion) mentioning [1] what impact, if any, they expect transient or persistent secondary structure to have on their conclusions and [2] how they expect aliphatic residues to contribute. These can and probably should be speculative as opposed to definitive.
Again - these are not raised as weaknesses in terms of this work, but the fact they are not discussed is a minor weakness, and the preprint's use and impact would be improved on such a discussion.
-
Reviewer #2 (Public Review):
This is an interesting manuscript where a CA-only CG model (Mpipi) was used to examine the critical temperature (Tc) of phase separation of a set of 140 variants of prion-like low complexity domains (PLDs). The key result is that Tc of these PLDs seems to have a linear dependence on substitutions of various sticker and space residues. This is potentially useful for estimating the Tc shift when making novel mutations of a PLD. However, I have strong reservations about the significance of this observation as well as some aspects of the technical detail and writing of the manuscript.
(1) Writing of the manuscript: The manuscript can be significantly shortened with more concise discussions. The current text reads as very wordy in places. It even appears that the authors may be trying a bit too hard to make a big deal out of the observed linear dependence.
The manuscript needs to be toned done to minimize self-promotion throughout the text. Some of the glaring examples include the wording "unprecedented", "our research marks a significant milestone in the field of computational studies of protein phase behavior ..", "Our work explores a new framework to describe, quantitatively, the phase behavior ...", and others.
There is really little need to emphasize the need to manage a large number of simulations for all 140 variants. Yes, some thoughts need to go into designing and managing the jobs and organizing the data, but it is pretty standard in computational studies. For example, large-scale protein ligand-free energy calculations can require one to a few orders of magnitude larger number of runs, and it is pretty routine.
When discussing the agreement with experimental results on Tm, it should be noted that the values of R > 0.93 and RMSD < 14 K are based on only 16 data points. I am not sure that one should refer to this as "extended validation". It is more like a limited validation given the small data size.
Results of linear fitting shown in Eq 4-12 should be summarized in a single table instead of scattering across multiple pages.
The title may also be toned down a bit given the limited significance of the observed linear dependence.
(2) Significance and reliability of Tc: Given the simplicity of Mpipi (a CA-only model that can only describe polymer chain dimension) and the low complexity nature of PLDs, the sequence composition itself is expected to be the key determinant of Tc. This is also reflected in various mean-field theories. It is well known that other factors will contribute, such as patterning (examined in this work as well), residual structures, and conformational preferences in dilute and dense phases. The observed roughly linear dependence is a nice confirmation but really unsurprising by itself. It appears how many of the constructs deviate from the expected linear dependence (e.g., Figure 4A) may be more interesting to explore.
The assumption that all systems investigated here belong to the same universality class as a 3D Ising model and the use of Eqn 20 and 21 to derive Tc is poorly justified. Several papers have discussed this issue, e.g., see Pappu Chem Rev 2023 and others. Muthukumar and coworkers further showed that the scaling of the relevant order parameters, including the conserved order parameter, does not follow the 3D Ising model. More appropriate theoretical models including various mean field theories can be used to derive binodal from their data, such as using Rohit Pappu's FIREBALL toolset. Imposing the physics of the 3D Ising model as done in the current work creates challenges for equivalence relationships that are likely unjustified.
While it has been a common practice to extract Tc when fitting the coexistence densities, it is not a parameter that is directly relevant physiologically. Instead, Csat would be much more relevant to think about if phase separation could occur in cells.
-
Reviewer #3 (Public Review):
Summary:
"Decoding Phase Separation of Prion-Like Domains through Data-Driven Scaling Laws" by Maristany et al. offers a significant contribution to the understanding of phase separation in prion-like domains (PLDs). The study investigates the phase separation behavior of PLDs, which are intrinsically disordered regions within proteins that have a propensity to undergo liquid-liquid phase separation (LLPS). This phenomenon is crucial in forming biomolecular condensates, which play essential roles in cellular organization and function. The authors employ a data-driven approach to establish predictive scaling laws that describe the phase behavior of these domains.
Strengths:
The study benefits from a robust dataset encompassing a wide range of PLDs, which enhances the generalizability of the findings. The authors' meticulous curation and analysis of this data add to the study's robustness. The scaling laws derived from the data provide predictive insights into the phase behavior of PLDs, which can be useful in the future for the design of synthetic biomolecular condensates.
Weaknesses:
While the data-driven approach is powerful, the study could benefit from more experimental validation. Experimental studies confirming the predictions of the scaling laws would strengthen the conclusions. For example, in Figure 1, the Tc of TDP-43 is below 300 K even though it can undergo LLPS under standard conditions. Figure 2 clearly highlights the quantitative accuracy of the model for hnRNPA1 PLD mutants, but its applicability to other systems such as TDP-43, FUS, TIA1, EWSR1, etc., may be questionable.
The authors may wish to consider checking if the scaling behavior is only observed for Tc or if other experimentally relevant quantities such as Csat also show similar behavior. Additionally, providing more intuitive explanations could make the findings more broadly accessible.
The study focuses on a particular subset of intrinsically disordered regions. While this is necessary for depth, it may limit the applicability of the findings to other types of phase-separating biomolecules. The authors may wish to discuss why this is not a concern. Some statements in the paper may require careful evaluation for general applicability, and I encourage the authors to exercise caution while making general conclusions. For example, "Therefore, our results reveal that it is almost twice more destabilizing to mutate Arg to Lys than to replace Arg with any uncharged, non-aromatic amino acid..." This may not be true if the protein has a lot of negative charges.
I am surprised that a quarter of a million CPU hours are described as staggering in terms of computational requirements.
-
-
www.researchsquare.com www.researchsquare.com
-
eLife assessment
This valuable study tests the functional role of food-washing behavior in removing tooth-damaging sand and grit in long-tailed macaques and whether dominance rank predicts the level of investment in the behavior. The evidence that food-washing is deliberate is compelling, but the evidence for variable and adaptive investment depending on rank is incomplete given confounding between sex and rank and limited sample size. A more careful and perhaps restrained interpretation of the findings, as well as a connection to the existing literature on optimal foraging theory, would increase the value of the study to its intended audience, i.e. researchers interested in foraging behavior, cognition, and primate evolution.
-
Reviewer #1 (Public Review):
Summary:
In this paper, the authors had 2 aims:
(1) Measure macaques' aversion to sand and see if its' removal is intentional, as it is likely in an unpleasurable sensation that causes tooth damage.
(2) Show that or see if monkeys engage in suboptimal behavior by cleaning foods beyond the point of diminishing returns, and see if this was related to individual traits such as sex and rank, and behavioral technique.
They attempted to achieve these aims through a combination of geochemical analysis of sand, field experiments, and comparing predictions to an analytical model.
The authors' conclusions were that they verified a long-standing assumption that monkeys have an aversion to sand as it contains many potentially damaging fine-grained silicates and that removing it via brushing or washing is intentional.
They also concluded that monkeys will clean food for longer than is necessary, i.e. beyond the point of diminishing returns, and that this is rank-dependent.
High and low-ranking monkeys tended not to wash their food, but instead over-brushed it, potentially to minimize handling time and maximize caloric intake, despite the long-term cumulative costs of sand.
This was interpreted through the *disposable soma hypothesis*, where dominants maximize immediate needs to maintain rank and increase reproductive success at the potential expense of long-term health and survival.
Strengths:
The field experiment seemed well-designed, and their quantification of physical and mineral properties of quartz particles (relative to human detection thresholds) seemed good relative to their feret diameter and particle circularity (to a reviewer who is not an expert in sand). The *Rank Determination* and *Measuring Sand* sections were clear.
In achieving Aim 1, the authors validated a commonly interpreted, but unmeasured function, of macaque and primate behavior-- a key study/finding in primate food processing and cultural transmission research.
I commend their approach in developing a quantitative model to generate predictions to compare to empirical data for their second aim.
This is something others should strive for.
I really appreciated the historical context of this paper in the introduction, and found it very enjoyable and easy to read.
I do think that interpreting these results in the context of the *disposable soma hypothesis* and the potential implications in the *paleolithic matters* section about interpreting dental wear in the fossil record are worthwhile.
Weaknesses:
Most of the weaknesses in this paper lie in statistical methods, visualization, and a missing connection to the marginal value theorem and optimal foraging theory.
I think all of these weaknesses are solvable.
The data and code were not submitted. Therefore I was unable to better understand the simulation or to provide useful feedback on the stats, the connection between the two, and its relevance to the broader community.
(1) Statistics:
(a) AIC and outcome distributions
The use of AIC for hierarchical models, and models with different outcome distributions brought up several concerns.
The authors appear to use AIC to help inform which model to use for their primary analyses in Tables S1 and S2. It is unclear which of these models are analyzed in Tables S3 and S4.
AIC should not be used on hierarchical models, and something like WAIC (or DIC which has other caveats) would be more appropriate.
Also, using information criteria on Mixture Models like Negative Binomials (aka Gamma-Poisson) should be done with extreme caution, or not at all, as the values are highly sensitive to the data structure.
Some researchers also say that information criteria should not be used to compare models with different outcome distributions - although this might be slightly less of a concern as all of your models are essentially variations on a Poisson GLM.
Discussion on this can be found in McElreath Statistical Rethinking (Section 12.1.3) and Gelman et al. BDA3 (Chapter 7).
Choosing an outcome distribution, based on your understanding of the data generating process is a better approach than relying on AIC, especially in this context where it can be misleading.
(b) Zeros
I also had some concerns about how zeros were treated in the models.
In lines 217-218, they mentioned that "if a monkey consumed a cucumber slice without brushing or washing it, the zero-second duration was included in both GLMMs."
This zero implies no processing and should not be treated as a length 0 duration of processing.
This suggests to me that a zero-inflated poisson or zero-inflated negative binomial, would be the best choice for modelling the data as it is essentially a 2-step process:<br /> (i) Do they process the cucumber at all?<br /> (ii) If so do they wash or brush, and how is this predicted by rank and treatment?
(2) Absence of Links to Foraging Theory
Optimal cleaning time model: the optimality model was not well described including how it was programmed. Better description and documentation of this model, along with code (Mathematica judging from the plot?) is needed.
There seems to be much conceptual and theoretical overlap with foraging theory models that were not well described - namely the *marginal value theorem (Charnov (1976), Krebs et al. (1974),) and its subsequent advances* (see https://doi.org/10.1016/j.jaa.2016.03.002 and https://doi.org/10.1086/283929 for examples).
In the suggestions, I attached the R code where I replicated their model to show that it is *mathematically identical to the marginal value theorem*. This was not mentioned at all in the text or citations.
This is a well-studied literature since the 1970's and there is a history of studies that compare behavior to an optimality model and fail (or do find) instances where animals conform or diverge with its predictions (https://doi.org/10.1146/annurev.es.15.110184.002515). This link should be highlighted, and interpreting it in that theoretical context will make it more broadly applicable to behavioral ecologists.
The data was subsetted to include instances where there were < 3 monkeys present to avoid confounds of rank, but it is important to know that optimal behavior might vary by individual, and can change in a social context depending on rank (see https://doi.org/10.1016/j.tree.2022.06.010). Discussion of this, and further exploration of it in the data would strengthen the overall contribution of this manuscript to the field, but I understand that the researchers wish to avoid that in this paper for it is a complex topic, which this dataset is uniquely suited to address.
(3) Interpretation and validity of model relative to data
In lines 92-102, they present summary statistics (I think) showing that time spent brushing and washing is consistent with washing or brushing to remove sand.
In the **mitigating tooth wear** section (line 73) and corresponding Figure S1 showing surface sand removed, more detail about how these numbers were acquired, and statistical modelling, is needed.
This is important as uncertainty and measurement error around these metrics are key to the central finding and interpretation of Aim 2 in this paper.
It appears that the researchers simulated the monkey's brushing and washing behaviors (similar to https://doi.org/10.1007/s10071-009-0230-3).
How many researchers simulated monkey behavior and how many times?
What are the repeat points in Figure S1?
What is the number of trials or number of people?
This effect appears stronger for washing than brushing as well - if so, why?
More info about this data, and the uncertainty in this is important, as it is key to the second central claim of this paper.
The estimates of removing between 76% +/- 7 and 93% +/- 4 of sand (visualized in Figure S1), are statistical estimates.
I would find the argument more convincing if after propagating for the uncertainty in handling in sand removal rates, and the corresponding half-saturation constants, if this processing for food is too long, after accounting for diminishing returns held true.<br /> It is very possible that after accounting for uncertainty and variation in handling time and removal rates, the second result may not hold true.
I was not able to convince myself of this via reanalysis as the description of the data in the text was not enough to simulate it myself.
Essentially, this would imply that in Figure 3 the predicted value would have some variation around it (informed by boundary conditions of time being positive, and percents having floors and ceilings) and that a range of predicting cleaning times (optimal give-up times) would be plotted in Figure 3.
This could be accomplished in a Bayesian approach, Or by simply plotting multiple predictions given some confidence interval around, c and h.
-
Reviewer #2 (Public Review):
Summary:
This field experiment aimed to assess what motivates macaque monkeys to clean food items prior to consumption and the relative costs and benefits of different cleaning approaches (manually brushing sand from food versus dousing food items in water). The experiment teases apart if/how the benefits of these approaches are mediated by the amount of debris on food and the monkeys' rank in terms of the costs of consuming sand versus the time and energy required to remove it. The authors not only examined the behavioral responses of wild macaques to three conditions of food sand contamination but also tested the relative costs of consuming different levels and sizes of sand particulates. Through this, the authors propose considerations of the macaques' motivations to clean food and the balance they take in energetic gains from consuming food versus the costs of cleaning food and consuming sand. Their data reveal that food washing is more effective in removing sand, but more costly than manually brushing off sand. This study also revealed that only mid-ranked monkeys washed their food, while high and low-ranked monkeys were more likely to remove sand via brushing it off food with their hands.
Strengths:
This study provides a very in-depth consideration of the motivations of macaques to clean their food, and the relative costs and benefits of different food cleaning techniques. Not only did the study test the behavior of wild macaques via a simple yet elegant field study, but they also performed a detailed analysis of the sand particulates to understand the level of potential tooth wear that consuming it could result in. By relying on a wild group of macaques that have been part of a long-term study site, the team also had detailed behavioral data on the population to allow for rank assessments of the animals. This comprehensive study provides important foundational information for a better understanding of how and why macaques clean food, that inform existing and future considerations of this as a potential cultural behavior.
Weaknesses:
As currently written, the paper does not provide sufficient background on this population of animals and their prior demonstrations of food-cleaning behavior or other object-handling behaviors (e.g., stone handling). Moreover, the authors' conclusions focus on the behavior of high-ranked animals, but subordinate animals also showed similar behavioral patterns and they should be considered in more detail too.
-
Reviewer #3 (Public Review):
This paper provides evidence that food washing and brushing in wild long-tailed macaques are deliberate behaviors to remove sand that can damage tooth enamel. The demonstration of the immediate functional importance of these behaviors is nicely done. However, the paper also makes the claim that macaques systematically differ in their investment in food cleaning because of rank-dependent differences in their costs and benefits. This latter conclusion is not, in my view, well-supported, for several reasons.
First, as is typical in many primate studies, the authors construct sex-specific ordinal rank hierarchies. This makes sense since hierarchies for males and hierarchies for females are determined by different processes and have different consequences. However, if I understand it correctly, they are then lumped together in all statistical analyses of rank, which makes the apparent rank effect very difficult to understand. The challenge of interpretation is increased because there are twice as many adult females in the group as adult males, so the rank is confounded by sex (because all low-rank values are adult females).
Second, because only one social group is being studied, the conclusions about rank may be heavily driven by individual identity, not rank per se. An analysis involving replicate social groups (which granted, may be impossible here) or longitudinal data showing a change in behavior following a change in rank would be much more compelling.
Third, there is no evidence presented on the actual fitness-related costs of tooth wear or the benefits of slightly faster food consumption. Support for these arguments is provided based on other papers, some of which come from highly resource-limited populations (and different species). But this is a population that is supplemented by tourists with melons, cucumbers, and pineapples! In the absence of more direct data on fitness costs and benefits, the paper makes overly strong claims about the ability to explain its observations based on "immediate energetic requirements" (abstract), "difference...freighted with fitness consequences" (line 80), and "pressing energetic needs"/"live fast, die young" (lines 121-122--there is no evidence that tooth wear is associated with morbidity or mortality here). The idea that high-ranking animals are "sacrificing their teeth at the altar of high rank" seems extreme.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
The present paper describes an important methodological development that combines light (confocal) microscopy with scanning and transmission EM and EM tomography. The method expands the level of structural detail accessible to large-volume EM studies and thus represents an approach to integrate analyses of cellular and sub-cellular structures in biological samples. The study, which provides a compelling proof-of-principle, will be of particular value to cell biologists interested in the in-depth interpretation of high-resolution ultrastructural information from sparsely distributed targets - at multiple scales and in diverse biological structures.
-
Reviewer #1 (Public Review):
The present paper presents a new, simple, and cost-effective technique for multimodal EM imaging that combines the strengths of volume scanning electron microscopy (SEM) and electron microscopic tomography. The novel ATUM-Tomo approach enables the consecutive inspection of selected areas of interest by correlated serial SEM and TEM, optionally in combination with CLEM, as demonstrated. The most important finding of ATUM-Tomo and particularly correlative ATUM-Tomo is that it can bridge several scales from the cellular to the high-resolution subcellular scale, from the micrometer to low nanometer resolution, which is particularly important for the ultrastructural analysis of biological regions of interest as demonstrated here by focal pathologies or rare cellular and subcellular structures. Both imaging modalities are non-destructive, thus allowing re-imaging and hierarchical imaging at the SEM and TEM levels, which is particularly important for precious samples, such as human biopsies or specimens from complex CLEM experiments. The paper demonstrates that the new approach is very helpful in analyses of pathologically altered brains, including humans brain tissue samples, that require high-resolution SEM and TEM in combination with immunohistochemistry for analysis. Even the combination with tracers would be possible. In sum, ATUM-Tomo opens up new possibilities in multimodal volume EM imaging for diverse biological areas of research.
Strengths
This paper is a very nice piece of work. It combines modern, high-end, state-of-the-art technology that allows to investigate diverse biological questions in different fields and at multiple scales. The paper is clear and well-written. It is accompanied by excellent figures, supplementals, and colored 3D-reconstructions that make it easy for the reader to follow the experimental procedure and the scientific context alike.
Weaknesses
There is a bit of an imbalance between the description of the state-of-the-art methodology and the scientific context. The discussion of the latter could be expanded.
-
Reviewer #2 (Public Review):
Kislinger et al. present a method permitting a targeted, multi-scale ultrastructural imaging approach to bridge the resolution gap between large-scale scanning electron microscopy (SEM) and transmission electron microscopy (TEM). The key methodological development consists of an approach to recover sections of resin-embedded material produced by Automated Tape Collecting Ultramicrotomy (ATUM), thereby permitting regions of interest identified by serial section SEM (ATUM-SEM) screening to be subsequently re-examined at higher resolution by TEM tomography (ATUM-Tomo). The study shows that both formvar and permanent marker coatings are in principle compatible with solvent-based release of pre-screened sections from ATUM tape (carbon nanotubule or Kapton tape). However, a comparative analysis of potential limitations and artifacts introduced by these respective coatings revealed permanent marker to provide a superior coating; permanent marker coatings are more easily and reliably applied to tape with only minor contaminants affecting the recovered section-tape interface with negligible influence on tomogram interpretation. Convincing proof-of-principle is provided by integrating this novel ATUMTomo technique into a technically impressive correlated light and electron microscopy (CLEM) approach specifically tailored to investigate ultrastructural manifestations of trauma-induced changes in blood-brain barrier permeability.
Strengths
Schematics and figures are very well-constructed, illustrating the workflow in a logical and easily interpretable manner. Light and electron microscope image data are of excellent quality, and the efficacy of the ATUM-Tomo approach is evidenced by a qualitative assessment of ATUM-SEM performance using coated tape variants and a convincing correlation between scanning and transmission electron microscopy imaging modalities. Potential ultrastructural artifacts induced via solvent exposure and any subsequent mechanical stress incurred during section detachment were thoroughly and systematically investigated using appropriate methods and reported with commendable transparency. In summary, the presented data convincingly support the claims of the study. A major strength of this work includes its general applicability to a broad range of biological questions and ultrastructural targets demanding resolutions exceeding that obtained via serial section and destructive block-face imaging approaches alone. The level of methodological detail provided is sufficient for replication of the ATUM-Tomo technique in other laboratories. Consequently, this relatively simple and cost-effective technique is widely adoptable by electron microscopy laboratories, and its integration into existing ATUM-SEM workflows supports a versatile and non-destructive imaging regime enabling high-resolution details of targeted structures to be interpreted within anatomical and subcellular contexts.
Weaknesses
I find no significant weaknesses in the current version of the manuscript.
-
Author response:
The following is the authors’ response to the original reviews.
Reviewer #1: The authors may consider moving the supplemental figures into the main body of the paper since they finally would end up with a total of eight figures.
As we added two more supplementary figures, we left them separated from the main part of the manuscript in the supplement. All of them describe important experimental details but we believe that it is easier to follow if there is a focus on the key results.
Reviewer #1: In general, the methods and techniques used here are beside some required but important additions described in sufficient detail.
Reviewer #2: Given the identified importance of glow-discharge treatment of precoated tape to the flat deposition of sections during ATUM, a corresponding schematic or appropriate reference(s) providing more information about the custom-built tape plasma device would likely be a prerequisite for effective reproduction of this technique in other laboratories.
Thank you for the valuable comments on the missing experimental details, which could affect the ease of establishing ATUM-Tomo in other labs. We will clearly highlight the ATUM-Tomo-specific vs. some general EM processing steps of the workflow in the proposed way. A detailed description of the custom-built tape plasma device will be added to the methods section. In addition, we will reference more explicitly our published protocols, which describe the standard electron microscopy embedding steps in great detail (Kislinger et al., STAR protocols, 2020; Kislinger et al., Meth Cell Biol, 2023).
Reviewer #1: Concerning the results section: In my opinion, the results section is a bit unbalanced. There is a mismatch between the detailed description of the methodology (experimental approach) and the scientific findings of the paper. The reviewer can see the enormous methodological impact of the paper, which on the other hand is the major drawback of the paper. To my opinion, the authors should also give a more detailed description of their scientific results.
Concerning the discussion: It would have been nice to give a perspective to which the described methodology can be used not only to describe diverse biological aspects that can be addressed and answered by this experimental approach. For example, how could this method be used to address various questions about the normal and pathologically altered brain?
In my opinion, the paper has one major drawback which is that it is more methodologically based although the authors included a scientific application of the method. The question here is to balance the methodology vs. the scientific achievement of this paper, a decision hard to take. In other words, one could recommend this paper to more methodologically based journals, for example, Nature Methods.
Balancing the technological and biological parts is indeed a difficult issue. We agree that this manuscript mainly describes a technical advancement and demonstrates its power to answer previously unsolved scientific questions. We exemplify this in our model system, neuropathology of the blood-brain barrier. The biological impact of ATUM-SEM has been described in detail in Khalin et al., Small, 2022, and is referenced accordingly. Here we describe how ATUM-Tomo can be applied to reveal biological insights exceeding the capabilities of ATUM-SEM and other volume electron microscopy techniques. However, the description of the methodological development outweighs by far the one of the biological details. We consider eLife‘s Tools and Resources (which, in our view, is in scope similar to Nat Methods) an ideal format for this technically focused manuscript while targeting eLife’s readership with diverse biological fields of interest for potential applications of the method. We suggested the application in connectomics (for chemical synapses), the study of endocytosis and the detection of virus particles in the discussion. Hopefully, this accommodates the Reviewer’s concern that having only a single application might seem arbitrary or even suggest a very narrow utility of the technique.
“While we demonstrate a neuropathology-related application, further biological targets that require high-resolution isotropic voxels and the spatial orientation within a larger ultrastructural context can potentially be studied by ATUM-Tomo. This includes the detection of gap junctions for connectomics or for the study of long-range projections (Holler et al., 2021) and the subcellular location of virus particles (Wu et al., 2022, Roingeard, 2008, Pelchen-Matthews and Marsh, 2007). Thus, ATUM-Tomo opens up new avenues for multimodal volume EM imaging of diverse biological research areas.”
Reviewer #2: Is the separation of sections from permanent marker-treated tape sensitive to the time interval between deposition/SEM imaging and acetone treatment?
Thank you for pointing out this important methodological aspect. We have not systematically investigated whether there is a critical time window between microtomy, SEM, and detachment. From the samples generated for this study, we assessed the importance of timing in retrospect:
“The sections could be recovered even four months after collection and nine months after SEM imaging.”
Reviewer #2: To what extent is slice detachment from permanent marker-treated tape resin-dependent [i.e. has ATUM-Tomo been tested on resin compositions beyond LX112 (LADD)]?
We appreciate this comment addressing the broader technical applicability of ATUM-Tomo. We tested the general workflow with tissue embedded in other commonly used resin types (epon, durcupan).
Reviewer #2: Minor corrections to the text and figures.
Line 83: ((Khalin et al., 2022) should read (Khalin et al., 2022)
Line 86 : 30nm should read 30 nm
Line 139: "...morphological normal tight junctions..." should read "...morphologically normal tight junctions..."
Line 283: "....despite glutaraldehyde fixation, a prerequisite for optimal ultrastructural preservation...".
Line 295: "In contrast, our CLEM approach provides high ultrastructural quality by optimal chemical fixation".
The concepts of optimal preservation and optimal fixation are arguably context- and application-dependent. These statements should be toned down or their context explicitly stated.
Thank you for the detailed corrections. We have applied them accordingly.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
The study presents valuable findings on compensatory mechanisms in response to glycosuria. The evidence supporting the claims is solid, although a causal relationship is somewhat uncertain and the addition of a more clinically relevant model would have strengthened the findings. The work will be of interest to diabetes investigators.
-
Reviewer #1 (Public Review):
Summary:
In this study, Faniyan and colleagues build on their recent finding that renal Glut2 knockout mice display normal fasting blood glucose levels despite massive glucosuria. Renal Glut2 knockout mice were found to exhibit increased endogenous glucose production along with decreased hepatic metabolites associated with glucose metabolism. Crh mRNA levels were higher in the hypothalamus while circulating ACTH and corticosterone was elevated in this model. While these mice were able to maintain normal fasting glucose levels, ablating afferent renal signals to the brain caused low fasting blood glucose levels. In addition, the higher CRH and higher corticosterone levels of the knockout mice were lost following this denervation. Finally, acute phase proteins were altered, plasma Gpx3 was lower, and major urinary protein MUP18 and its gene expression were higher in renal Glut2 knockout mice. Overall, the main conclusion that afferent signaling from the kidney is required for renal glut2 dependent increases in endogenous glucose production is well supported by these findings.
Strengths:
An important strength of the paper is the novelty of the identification of kidney to brain communication as being important for glucose homeostasis. Previous studies had focused on other functions of the kidney modulated by or modulating brain function. This work is likely to promote interest in CNS pathways that respond to afferent renal signals and the response of the HPA axis to glucosuria. Additional strengths of this paper stem from the use of incisive techniques. Specifically, the authors use isotope enabled measurement of endogenous glucose production by GC-MS/MS, capsaicin ablation of afferent renal nerves, and multifiber recording from the renal nerve. The authors also paid excellent attention to rigor in the design and performance of these studies. For example, they used appropriate surgical controls, confirmed denervation through renal pelvic CGRP measurement, and avoided the confounding effects of nerve regrowth over time. These factors strengthen confidence in their results. Finally, humans with glucose transporter mutations and those being treated with SGLT2 inhibitors show a compensatory increase in endogenous glucose production. Therefore, this study strengthens the case for using renal Glut2 knockout mice as a model for understanding the physiology of these patients.
Comments on latest version:
My concerns have been addressed.
-
Author response:
The following is the authors’ response to the previous reviews.
Reviewer #1 (Public Review):
Summary:
In this study, Faniyan and colleagues build on their recent finding that renal Glut2 knockout mice display normal fasting blood glucose levels despite massive glucosuria. Renal Glut2 knockout mice were found to exhibit increased endogenous glucose production along with decreased hepatic metabolites associated with glucose metabolism. Crh mRNA levels were higher in the hypothalamus while circulating ACTH and corticosterone was elevated in this model. While these mice were able to maintain normal fasting glucose levels, ablating afferent renal signals to the brain caused low fasting blood glucose levels. In addition, the higher CRH and higher corticosterone levels of the knockout mice were lost following this denervation. Finally, acute phase proteins were altered, plasma Gpx3 was lower, and major urinary protein MUP18 and its gene expression were higher in renal Glut2 knockout mice. Overall, the main conclusion that afferent signaling from the kidney is required for renal glut2 dependent increases in endogenous glucose production is well supported by these findings.
Strengths:
An important strength of the paper is the novelty of the identification of kidney to brain communication as being important for glucose homeostasis. Previous studies had focused on other functions of the kidney modulated by or modulating brain function. This work is likely to promote interest in CNS pathways that respond to afferent renal signals and the response of the HPA axis to glucosuria. Additional strengths of this paper stem from the use of incisive techniques. Specifically, the authors use isotope enabled measurement of endogenous glucose production by GC-MS/MS, capsaicin ablation of afferent renal nerves, and multifiber recording from the renal nerve. The authors also paid excellent attention to rigor in the design and performance of these studies. For example, they used appropriate surgical controls, confirmed denervation through renal pelvic CGRP measurement, and avoided the confounding effects of nerve regrowth over time. These factors strengthen confidence in their results. Finally, humans with glucose transporter mutations and those being treated with SGLT2 inhibitors show a compensatory increase in endogenous glucose production. Therefore, this study strengthens the case for using renal Glut2 knockout mice as a model for understanding the physiology of these patients.
Weaknesses:
A few weaknesses exist. Most concerns relate to the interpretation of this study's findings. The authors state that loss of glucose in urine is sensed as a biological threat based on the HPA axis activation seen in this mouse model. This interpretation is understandable but speculative. Importantly, whether stress hormones mediate the increase in endogenous glucose production in this model and in humans with altered glucose transporter function remains to be demonstrated conclusively. For example, the paper found several other circulating and local factors that could be causal. This model is also unable to shed light on how elevated stress hormones might interact with insulin resistance, which is known to increase endogenous glucose production. That issue is of substantial clinical relevance for patients with T2D and metabolic disease. Finally, how these findings can contribute to improving the efficiency of drugs like SGLT2 inhibitors remains to be seen.
- We agree with the reviewer’s overall assessment of this manuscript.
- Confirming the contribution of each secreted protein shown in Fig. 4, whose levels were changed between the two groups of mice, toward causing a compensatory increase in glucose production in response to elevated glycosuria is beyond the scope of this manuscript.
Reviewer #2 (Public Review):
Summary:
The authors previously generated renal Glut2 knockout mice, which have high levels of glycosuria but normal fasting glucose. They use this as an opportunity to investigate how compensatory mechanisms are engaged in response to glycosuria. They show that renal and hepatic glucose production, but not metabolism, is elevated in renal Glut2 male mice. They show that renal Glut2 male mice have elevated Crh mRNA in the hypothalamus, and elevated plasma levels of ACTH and corticosterone. They also show that temporary denervation of renal nerves leads to a decrease in fasting and fed blood glucose levels in female renal Glut2 mice, but not control mice. Finally, they perform plasma proteomics in male mice to identify plasma proteins that are changed (up or down) between the knockouts and controls.
Strengths:
The question that is trying to be addressed is clinically important: enhancing glycosuria is a current treatment for diabetes, but is limited in efficacy because of compensatory increases in glucose production.
Weaknesses:
(1) Although I appreciate that the initial characterization of the mice in another publication showed that both males and females have glycosuria, this does not mean that both sexes have the same mechanisms giving rise to glycosuria. There are many examples of sex differences in HPA activation in response to threat, for example. There is an unfounded assumption here that males and females have the same underlying mechanisms of glycosuria that undermines the significance of the findings.
- We agree with the reviewer that although we didn’t observe sex differences in renal Glut2 KO mice in the context of glucose homeostasis, their response (or mechanisms) to elevated glycosuria in enhancing compensatory glucose production may be different between the sexes. Therefore, we have included this limitation in discussion section.
(2) The authors state that they induced the Glut2 knockout with taxomifen as in their previous publication. The methods of that publication indicate that all experiments were completed within 14 days of inducing the Glut2 knockout. This means that the last dose of tamoxifen was delivered 14 days prior to the experimental endpoint of each experiment. This seems like an important experimental constraint that should be discussed in this manuscript. Is the glycosuria that follows Glut2 knockout only a temporary change? If so, then the long-term change in glycosuria that follows SGLT2 inhibition in humans might not be best modelled by this knockout. Please specify when the surgeries to implant a jugular catheter or ablate the renal nerves performed relative to the Glut2 knockout in the Methods.
- The reviewer’s statement ‘The methods of that publication indicate that all experiments were completed within 14 days of inducing the Glut2 knockout’ is incorrect. In the referred publication, we had explicitly mentioned in methods, ‘All of the experiments, except those using a diet-induced obesity mouse model or noted otherwise, were completed within 14 days of inducing the Glut2 deficiency.’ Please see figures 5h-l and 6 in the cited publication, which demonstrate that all the experiments were not completed within 14 days of inducing renal Glut2 deficiency. Per the reviewer’s advice, in the present manuscript we have include the timeline (which in some cases is 4 months beyond inducing glycosuria) in all the figure legends. In addition, for a separate project (which is unpublished) we have measured glycosuria up to 1 year after inducing renal Glut2 deficiency. Therefore, the glycosuria observed in the renal Glut2 KO mice is not temporary.
(3) I am still unclear what group was used for controls. Are these wild-type mice who receive tamoxifen? Are they KspCadCreERT2;Glut2loxP/loxP mice who do not receive tamoxifen? This is important and needs to be specified.
- In our previous response to the reviewer, we had already mentioned which control group was used in this study. Please see our response to the second reviewer’s point 3. As mentioned to the reviewer, we had used Glut2loxp/loxp mice as the control group, which is also described multiple times in the figure legends of our previous paper that reported the phenotype of renal Glut2 KO mice. Per the reviewer’s advice, we have provided the information again in a revised version of this manuscript.
(4) The authors should report some additional control measures for the renal denervation that could also impact blood glucose and perhaps some of their other measures. The control measures, which one would like to see unimpacted by renal denervation, include body weights, food consumption and water intake, and glycosuria itself.
- Please also see fig. 3 in the present manuscript that demonstrates renal afferent denervation doesn’t influence baseline blood glucose or plasma insulin levels. We have now also mentioned in the text that the denervation doesn’t affect food intake or bodyweight.
(5) The graphical abstract shows a link between the hypothalamus and the liver that is completely unsupported by any of the current findings. That arrow should be removed.
- Because we observed an increase in hepatic glucose production in renal Glut2 KO mice (Fig. 1) - which was reduced by 50% after selective afferent renal denervation (Fig. 3) - in the graphical abstract we are suggesting a neural connection between the kidney-brain-liver or an endocrine factor(s) to account for these changes in blood glucose levels as also described in the discussion section. We can include a question mark ‘?’ in the graphical abstract to show that further studies are need to validate these proposed mechanisms; however, we cannot just remove the arrow as advised by the reviewer.
(6) Though the authors have toned down their language implying a causal link between the HPA measures and compensatory elevation of blood glucose in the face of glycosuria, the title still implies this causal link. It is still the case that their data do not support causation. There are many potential ways to establish a causal link but those experiments are not performed here. The renal afferents are correlated with Crh content of the PVN, but nothing has been done to show that the Crh content is important for elevating blood glucose. In light of this, the title should be toned down. Perhaps something like "Renal nerves maintain blood glucose production and elevated HPA activity in response to glycosuria". The link between HPA and glucose is not shown in this paper.
- We request the reviewer to take a look at figure 1, showing an increase in glucose production in renal Glut2 KO mice and figure 3, which demonstrates that an afferent renal denervation reduces blood glucose levels by 50%. The afferent renal denervation (ablation of afferent renal nerves) does reduce blood glucose levels in renal Glut2 KO mice. Therefore, the use of the word ‘promote’ in the title is accurate and appropriate to reflect the role of the afferent renal nerves in contributing to about 50% increase in blood glucose levels in renal Glut2 KO mice.
- Regarding the reviewer's comment on changes in Crh gene expression, please look at figure 3. Ablation of renal afferent nerves decreases hypothalamic Crh gene expression and other mediators of the HPA axis by 50%. Therefore, the afferent renal nerves do contribute to regulating blood glucose levels, at least in part, by the HPA axis (which is widely known to change blood glucose levels). The use of words such as ‘required’ or ‘necessary’ in the title may have indicated causal role or could have been misleading here; therefore, we have purposely used ‘promote’ in the title to accurately reflect the findings of this study.
Recommendations for the authors:
Reviewer #1 (Recommendations For The Authors):
I have only minor text corrections to add:
- line 223 "A list"
- line 253 "independent"
- line 271 "the body's"
- line 304 "do not"
Yes, we have corrected these errors in a revised version of this manuscript.
Reviewer #2 (Recommendations For The Authors):
(1) Please report the dilutions used, if any, for the ELISAs. If the samples were run neat, please report this. Many manufacturer's instructions say that the user must determine the correct dilution to use for the samples collected. Also, sometimes when small blood volumes are collected, samples must be diluted to achieve the minimum volume collected for the assay. It is not sufficient to indicate that a reader refers to the manufacturer's instructions.
- Per the reviewer’s advice, we have included the dilutions used for each assay in the methods.
-
-
www.biorxiv.org www.biorxiv.org
-
Author response:
The following is the authors’ response to the original reviews.
Reviewer #1:
Point 1: The authors have demonstrated that Cs9g12620 contains the EBE of PthA4 in the promoter region, to show that PthA4 controls Cs9g12620, the authors need to compare to the wild type Xcc and pthA4 mutant for Cs9g12620 expression. The data in Figure 1 is not enough.
The data in Figure 1 D and E show a pthA4 Tn5 insertion mutant Mxac126-80 and the expression level of Cs9g12620 in citrus inoculated with the pthA4 mutant.
Point 2: The authors confirmed the interaction between PthA4 and the EBE in the promoter of Cs9g12620 using DNA electrophoretic mobility shift assay (EMSA). However, Figure 2B is not convincing. The lane without GST-PthA4 also clearly showed a mobility shift. For the EMSA assay, the authors need also to include a non-labeled probe as a competitor to verify the specificity. The description of the EMSA in this paper suggests that it was not done properly. It is suggested the authors redo this EMSA assay following the protocol: Electrophoretic mobility shift assay (EMSA) for detecting protein-nucleic acid interactions PMID: 17703195.
Thank you very much for your comments. We have re-conducted the EMSA analysis based on your suggestion. The DNA probe was labeled with Cy5, included a non-labeled probe as a competitor. (Figure 3 B and D; Figure 4B and E)
Point 3: The authors also claimed that PthA4 suppresses the promote activity of Cs9g12620. The data is not convincing and also contradicts with their own data that overexpression of Cs9g12620 causes canker and silencing of it reduces canker considering PthA4 is required for canker development. The authors conducted the assays using transient expression of PthA4. It should be done with Xcc wild type, pthA4 mutant, and negative control to inoculate citrus plants to check the expression of Cs9g12620.
We have detected Cs9g12620 expression in silencing citrus plants inoculated wild type Xcc 29-1. (Figure 7F)
Point 4: Figure 6 AB is not convincing. There are no apparent differences. The variations shown in B are common in different wild-type samples. It is suggested that the authors conduct transgenic instead of transient overexpression.
It has been proven that transient expression of PthA4 leads to canker-like phenotype, suggesting that this experiment is effective. However, it will be more confident if conduct transgenic plant overexpressing pthA4 and Cs9g12620. We’ll create the plants in our following research to confirm the phenotype.
Point 5: Gene silencing data needs more appropriate controls. Figure D seems to suggest canker symptoms actually happen for the RNAi treated. The authors need to make sure the same amount of Xcc was used for both CTV empty vector and the RNAi. It is suggested a blink test is needed here.
We used the same amount of Xcc to inoculate CTV empty vector and the RNAi. In either inoculation, the cultured Xcc cells were suspended in sterile distilled water to a final concentration of 108 CFU/mL (OD600 = 0.3).
Point 6: Figure 1. Please draw a figure to clearly show the location of the EBE in the promoter of Cs9g12620, including the transcription start site, and translational start site.
The EBE in Cs9g12620 promoter was indicated by underlined in Figure supplement 1. We did not sure about the translation start site, but the translation start site was labelled.
-
eLife assessment
This valuable study provides new insight into potential subtle dynamics in effector biology. The data presented generally support the claims, but in some cases, significant controls are missing and so the overall work is currently incomplete. If the limitations can be addressed, this work should be of broad relevance for biologists interested in molecular plant-microbe interactions.
-
Reviewer #1 (Public Review):
Previous Review:
The authors have identified the predicted EBE of PthA4 in the promoter of Cs9g12620, which is induced by Xcc. The authors identified a homolog of Cs9g12620, which has variations in the promoter region. The authors show PthA4 suppresses Cs9g12620 promoter activity independent of the binding action. The authors also show that CsLOB1 binds to the promoter of Cs9g12620. Interestingly, the authors show that PthA4 interacts with CsLOB1 at protein level. Finally, it shows that Cs9g12620 is important for canker symptoms. Overall, this study has reported some interesting discoveries and the writing is generally well done. However, the discoveries are affected by the reliability of the data and some flaws of the experimental designs.
Here are some major issues:
The authors have demonstrated that Cs9g12620 contains the EBE of PthA4 in the promoter region, to show that PthA4 controls Cs9g12620, the authors need to compare to the wild type Xcc and pthA4 mutant for Cs9g12620 expression. The data in Figure 1 is not enough.
The authors confirmed the interaction between PthA4 and the EBE in the promoter of Cs9g12620 using DNA electrophoretic mobility shift assay (EMSA). However, Fig. 2B is not convincing. The lane without GST-PthA4 also clearly showed mobility shift. For EMSA assay, the authors need also to include non-labeled probe as competitor to verify the specificity. The description of the EMSA in this paper suggests that it was not done properly. It is suggested the authors to redo this EMSA assay following the protocol: Electrophoretic mobility shift assay (EMSA) for detecting protein-nucleic acid interactions PMID: 17703195.
The authors also claimed that PthA4 suppresses the promote activity of Cs9g12620. The data is not convincing and also contradicts with their own data that overexpression of Cs9g12620 causes canker and silencing of it reduces canker considering PthA4 is required for canker development. The authors conducted the assays using transient expression of PthA4. It should be done with Xcc wild type, pthA4 mutant and negative control to inoculate citrus plants to check the expression of Cs9g12620.
Fig. 6 AB is not convincing. There are no apparent differences. The variations shown in B is common in different wild type samples. It is suggested that the authors to conduct transgenic instead of transient overexpression.
Gene silencing data needs more appropriate controls. Fig. D. seems to suggest canker symptoms actually happen for the RNAi treated. The authors need to make sure same amount of Xcc was used for both CTV empty vector and the RNAi. It is suggested a blink test is needed here.
Comments on revised version:
Point 1: Addressed well.
Point 2: The EMSA was reconducted with adding unlabeled DNA, however, the results are still not convincing. Firstly, in fig.3D lane 5, with the absence of unlabeled DNA, the shifted bound signal wasn't reduced significantly. Secondly, still in fig.3D lane 5, the free labeled DNA probe at the bottom of the gel didn't increase. Which together mean that the unlabeled DNA was unable to compete with the labeled DNA and the "bound" shifted bands might not be true positive.
Point 3: The authors didn't address the question clearly regarding the connection between the inhibition of Cs9g12620 promoter by PthA4 and the positive function of Cs9g12620 on citrus canker.
Point 4: The comment was not addressed. Fig.7A and B are not convincing. Firstly, no evidence shows the expression of transiently expressed genes. Secondly, hard to tell the difference in 7A. Thirdly, since CsLOB1 positively regulates Cs9g12620, why expressing of CsLOB1 is unable to cause phenotype, while expression of PthA4 does?
Point 5: addressed.
-
-
www.biorxiv.org www.biorxiv.org
-
Reviewer #1 (Public Review):
Summary:
Authors explore how sex-peptide (SP) affects post-mating behaviours in adult females, such as receptivity and egg laying. This study identifies different neurons in the adult brain and the VNC that become activated by SP, largely by using an intersectional gene expression approach (split-GAL4) to narrow down the specific neurons involved. They confirm that SP binds to the well-known Sex Peptide Receptor (SPR), initiating a cascade of physiological and behavioural changes related to receptivity and egg laying.
Areas of improvement and suggestions:
(1) "These results suggest the SP targets interneurons in the brain that feed into higher processing centers from different entry points likely representing different sensory input" and "All together, these data suggest that the abdominal ganglion harbors several distinct type of neurons involved in directing PMRs"<br /> The characterization of the post-mating circuitry has been largely described by the group of Barry Dickson and other labs. I suggest ruling out a potential effect of mSP in any of the well-known post-mating neuronal circuitry, i.e: SPSN, SAG, pC1, vpoDN or OviDNs neurons. A combination of available split-Gal4 should be sufficient to prove this.
(2) Authors must show how specific is their "head" (elav/otd-flp) and "trunk" (elav/tsh) expression of mSP by showing images of the same constructs driving GFP.
(3) VT3280 is termed as a SAG driver. However, VT3280 is a SPSN specific driver (Feng et al., 2014; Jang et al., 2017; Scheunemann et al., 2019; Laturney et al., 2023). The authors should clarify this.
(4) Intersectional approaches must rule out the influence of SP on sex-peptide sensing neurons (SPSN) in the ovary by combining their constructs with SPSN-Gal80 construct. In line with this, most of their lines targets the SAG circuit (4I, J and K). Again, here they need to rule out the involvement of SPSN in their receptivity/egg laying phenotypes. Especially because "In the female genital tract, these split-Gal4 combinations show expression in genital tract neurons with innervations running along oviduct and uterine walls (Figures S3A-S3E)".
(5) The authors separate head (brain) from trunk (VNC) responses, but they don't narrow down the neural circuits involved on each response. A detailed characterization of the involved circuits especially in the case of the VNC is needed to (a) show that the intersectional approach is indeed labelling distinct subtypes and (b) how these distinct neurons influence oviposition.
-
Reviewer #3 (Public Review):
Summary:
This paper reports new findings regarding neuronal circuitries responsible for female post-mating responses (PMRs) in Drosophila. The PMRs are induced by sex peptide (SP) transferred from males during mating. The authors sought to identify SP target neurons using a membrane-tethered SP (mSP) and a collection of GAL4 lines, each containing a fragment derived from the regulatory regions of the SPR, fru, and dsx genes involved in PMR. They identified several lines that induced PMR upon expression of mSP. Using split-GAL4 lines, they identified distinct SP-sensing neurons in the central brain and ventral nerve cord. Analyses of pre- and post-synaptic connection using retro- and trans-Tango placed SP target neurons at the interface of sensory processing interneurons that connect to two common post-synaptic processing neuronal populations in the brain. The authors proposed that SP interferes with the processing of sensory inputs from multiple modalities.
Strengths:
Besides the main results described in the summary above, the authors discovered the following:
(1) Reduction of receptivity and induction of egg-laying are separable by restricting the expression of membrane-tethered SP (mSP): head-specific expression of mSP induces reduction of receptivity only, whereas trunk-specific expression of mSP induces oviposition only. Also, they identified a GAL4 line (SPR12) that induced egg laying but did not reduce receptivity.
(2) Expression of mSP in the genital tract sensory neurons does not induce PMR. The authors identified three GAL4 drivers (SPR3, SPR 21, and fru9), which robustly expressed mSP in genital tract sensory neurons but did not induce PMRs. Also, SPR12 does not express in genital tract neurons but induces egg laying by expressing mSP.
Weaknesses:
(1) Intersectional expression involving ppk-GAL4-DBD was negative in all GAL4AD lines (Supp. Fig.S5). As the authors mentioned, ppk neurons may not intersect with SPR, fru, dsx, and FD6 neurons in inducing PMRs by mSP. However, since there was no PMR induction and no GAL4 expression at all in any combination with GAL4-AD lines used in this study, I would like to have a positive control, where intersectional expression of mSP in ppk-GAL4-DBD and other GAL4-AD lines (e.g., ppk-GAL4-AD) would induce PMR.
(2) The results of SPR RNAi knock-down experiments are inconclusive (Figure 5). SPR RNAi cancelled the PMR in dsx ∩ fru11/12 and partially in SPR8 ∩ fru 11/12 neurons. SPR RNAi in dsx ∩ SPR8 neurons turned virgin females unreceptive; it is unclear whether SPR mediates the phenotype in SPR8 ∩ fru 11/12 and dsx ∩ SPR8 neurons.
SPR RNAi knock-down experiments may also help clarify whether mSP worked autocrine or juxtacrine to induce PMR. mSP may produce juxtacrine signaling, which is cell non-autonomous.
-
eLife assessment
This valuable study provides new insights into the neural circuits involved in post-mating responses in Drosophila females. It presents convincing evidence that the circuits for mating receptivity and egg-laying are distinct. A more thorough discussion regarding the integration of the new findings into the current understanding of post-mating behavior as well as clarification of some experimental details would further improve the manuscript.
-
Reviewer #2 (Public Review):
Sex peptide (SP) transferred during mating from male to female induces various physiological responses in the receiving female. Among those, the increase in oviposition and decrease in sexual receptivity are very remarkable. Naturally, a long standing and significant question is the identity of the underlying sex peptide target neurons that express the SP receptor and are underlying these responses. Identification of these neurons will eventually lead to the identification of the underlying neuronal circuitry.
The Soller lab has addressed this important question already several years ago (Haussmann et al. 2013), using relevant GAL4-lines and membrane-tethered SP. The results already showed that the action of SP on receptivity and oviposition is mediated by different neuronal subsets and hence can be separated. The GAL4-lines used at that time were, however, broad, and the individual identity of the relevant neurons remained unclear.
In the present paper, Nallasivan and colleagues carried this analysis one step further, using new intersectional approaches and transsynaptic tracing.
Strength:
The intersectional approach is appropriate and state-of-the art. The analysis is a very comprehensive tour-de-force and experiments are carefully performed to a high standard. The authors also produced a useful new transgenic line (UAS-FRTstopFRT mSP). The finding that neurons in the brain (head) mediate the SP effect on receptivity, while neurons in the abdomen and thorax (ventral nerve cord or peripheral neurons) mediate the SP effect on oviposition, is a significant step forward in the endavour to identify the underlying neuronal networks and hence a mechanistic understanding of SP action. Though this result is not entirely unexpected, it is novel as it was not shown before.
Weakness:
Though the analysis identifies a small set of neurons underlying SP responses, it does not go the last step to individually identify at least a few of them. The last paragraph in the discussion rightfully speculates about the neurochemical identity of some of the intersection neurons (e.g. dopaminergic P1 neurons, NPF neurons). At least these suggested identities could have been confirmed by straight-forward immunostainings agains NPF or TH, for which antisera are available. Moreover, specific GAL4 lines for NPF or P1 or at least TH neurons are available which could be used to express mSP to test whether SP activation of those neurons is sufficient to trigger the SP effect.
-
Author response:
Reviewer #1 (Public Review):
Areas of improvement and suggestions:
(1) "These results suggest the SP targets interneurons in the brain that feed into higher processing centers from different entry points likely representing different sensory input" and "All together, these data suggest that the abdominal ganglion harbors several distinct type of neurons involved in directing PMRs"
The characterization of the post-mating circuitry has been largely described by the group of Barry Dickson and other labs. I suggest ruling out a potential effect of mSP in any of the well-known post-mating neuronal circuitry, i.e: SPSN, SAG, pC1, vpoDN or OviDNs neurons. A combination of available split-Gal4 should be sufficient to prove this.
Indeed, we have tested drivers for some of these neurons already and agree that this information is important to distinguish neurons which are direct SP target from neurons which are involved in directing reproductive behaviors.
(2) Authors must show how specific is their "head" (elav/otd-flp) and "trunk" (elav/tsh) expression of mSP by showing images of the same constructs driving GFP.
The expression pattern for tshGAL, which expresses in the trunk is already published (Soller et al., 2006). We will add images for “head” expression.
(3) VT3280 is termed as a SAG driver. However, VT3280 is a SPSN specific driver (Feng et al., 2014; Jang et al., 2017; Scheunemann et al., 2019; Laturney et al., 2023). The authors should clarify this.
According to the reviewers suggestion, we will clarify the specificity of VT3280.
(4) Intersectional approaches must rule out the influence of SP on sex-peptide sensing neurons (SPSN) in the ovary by combining their constructs with SPSN-Gal80 construct. In line with this, most of their lines targets the SAG circuit (4I, J and K). Again, here they need to rule out the involvement of SPSN in their receptivity/egg laying phenotypes. Especially because "In the female genital tract, these split-Gal4 combinations show expression in genital tract neurons with innervations running along oviduct and uterine walls (Figures S3A-S3E)".
We agree with this reviewer that we need a higher resolution of expression to only one cell type. However, this is a major task that we will continue in follow up studies.
In principal, use of GAL80 is a valid approach to restrict expression, if levels of GAL80 are higher than those of GAL4, because GAL80 binds GAL4 to inhibit its activity. Hence, if levels of GAL80 are lower, results could be difficult to interpret.
(5) The authors separate head (brain) from trunk (VNC) responses, but they don't narrow down the neural circuits involved on each response. A detailed characterization of the involved circuits especially in the case of the VNC is needed to (a) show that the intersectional approach is indeed labelling distinct subtypes and (b) how these distinct neurons influence oviposition.
Again, we agree with this reviewer that we need a higher resolution of expression to only one cell type. However, this is a major task that we will continue in follow up studies.
Reviewer #2 (Public Review):
Strength:
The intersectional approach is appropriate and state-of-the art. The analysis is a very comprehensive tour-de-force and experiments are carefully performed to a high standard. The authors also produced a useful new transgenic line (UAS-FRTstopFRT mSP). The finding that neurons in the brain (head) mediate the SP effect on receptivity, while neurons in the abdomen and thorax (ventral nerve cord or peripheral neurons) mediate the SP effect on oviposition, is a significant step forward in the endavour to identify the underlying neuronal networks and hence a mechanistic understanding of SP action. Though this result is not entirely unexpected, it is novel as it was not shown before.
We thank reviewer 2 for recognizing the advance of our work.
Weakness:
Though the analysis identifies a small set of neurons underlying SP responses, it does not go the last step to individually identify at least a few of them. The last paragraph in the discussion rightfully speculates about the neurochemical identity of some of the intersection neurons (e.g. dopaminergic P1 neurons, NPF neurons). At least these suggested identities could have been confirmed by straight-forward immunostainings agains NPF or TH, for which antisera are available. Moreover, specific GAL4 lines for NPF or P1 or at least TH neurons are available which could be used to express mSP to test whether SP activation of those neurons is sufficient to trigger the SP effect.
We appreciate this reviewers recognition of our previous work showing that receptivity and oviposition are separable. As pointed out we have now gone one step further and identified in a tour de force approach subsets of neurons in the brain and VNC.
We agree with this reviewer that we need a higher resolution of expression to only one cell type. As pointed out by this reviewer, the neurochemical identity is an excellent suggestions and will help to further restrict expression to just one type of neuron. However, this is a major task that we will continue in follow up studies.
Reviewer #3 (Public Review):
Strengths:
Besides the main results described in the summary above, the authors discovered the following:
(1) Reduction of receptivity and induction of egg-laying are separable by restricting the expression of membrane-tethered SP (mSP): head-specific expression of mSP induces reduction of receptivity only, whereas trunk-specific expression of mSP induces oviposition only. Also, they identified a GAL4 line (SPR12) that induced egg laying but did not reduce receptivity.
(2) Expression of mSP in the genital tract sensory neurons does not induce PMR. The authors identified three GAL4 drivers (SPR3, SPR 21, and fru9), which robustly expressed mSP in genital tract sensory neurons but did not induce PMRs. Also, SPR12 does not express in genital tract neurons but induces egg laying by expressing mSP.
We thank reviewer 3 for recognizing these two important points regarding the SP response that point to a revised model for how the underlying circuitry induces the post-mating response.
Weaknesses:
(1) Intersectional expression involving ppk-GAL4-DBD was negative in all GAL4AD lines (Supp. Fig.S5). As the authors mentioned, ppk neurons may not intersect with SPR, fru, dsx, and FD6 neurons in inducing PMRs by mSP. However, since there was no PMR induction and no GAL4 expression at all in any combination with GAL4-AD lines used in this study, I would like to have a positive control, where intersectional expression of mSP in ppk-GAL4-DBD and other GAL4-AD lines (e.g., ppk-GAL4-AD) would induce PMR.
We will add positive controls of for ppk-DBD expression and expand the discussion section.
(2) The results of SPR RNAi knock-down experiments are inconclusive (Figure 5). SPR RNAi cancelled the PMR in dsx ∩ fru11/12 and partially in SPR8 ∩ fru 11/12 neurons. SPR RNAi in dsx ∩ SPR8 neurons turned virgin females unreceptive; it is unclear whether SPR mediates the phenotype in SPR8 ∩ fru 11/12 and dsx ∩ SPR8 neurons.
We agree with this reviewer that the interpretation of the SPR RNAi results are complicated by the fact that SP has additional receptors (Haussmann et al 2013). The results are conclusive for all three intersections when expressing UAS mSP in SPR RNAi with respect to oviposition, e.g. egg laying is not induced in the absence of SPR. For receptivity, the results are conclusive for dsx ∩ fru11/12 and partially for SPR8 ∩ fru 11/12.
Potentially, SPR RNAi knock-down does not sufficiently reduce SPR levels to completely reduce receptivity in some intersection patterns, likely also because splitGal4 expression is less efficient.
Why SPR RNAi in dsx ∩ SPR8 neurons turned virgin females unreceptive is unclear, but we anticipate that we need a higher resolution of expression to only one cell type to resolve this unexpected result. However, this is a major task that we will continue in follow up studies.
SPR RNAi knock-down experiments may also help clarify whether mSP worked autocrine or juxtacrine to induce PMR. mSP may produce juxtacrine signaling, which is cell non-autonomous.
Whether membrane-tethered SP induces the response in a autocrine manner is an import aspect in the interpretation of the results from mSP expression.
Removing SPR by SPR RNAi and expression of mSP in the same neurons did not induce egg laying for all three intersection and did not reduce receptivity for dsx ∩ fru11/12 and for SPR8 ∩ fru 11/12. Accordingly, we can conclude that for these neurons the response is induced in an autocrine manner.
We will add this aspect to the discussion section.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This study highlights an important discovery: a bacterial pathogen's effector influences plant responses that in turn affect how the leafhopper insect vector for the bacteria is attracted to the plants in a sex-dependent manner. The research is backed by convincing physiological and transcriptome analyses. This study unveils a complex interdependence between the pathogen effector, male leafhoppers, and a plant transcription factor in modulating female attraction to the plant, shedding light on previously unexplored aspects of plant-bacteria-insect interactions.
-
Reviewer #1 (Public Review):
Summary:
Orlovski and his colleagues revealed an interesting phenomenon that SAP54-overexpressing leaf exposure to leafhopper males is required for the attraction of followed females. By transcriptomic analysis, they demonstrated that SAP54 effectively suppresses biotic stress response pathways in leaves exposed to the males. Furthermore, they clarified how SAP54, by targeting SVP, heightens leaf vulnerability to leafhopper males, thus facilitating female attraction and subsequent plant colonization by the insects.
Strengths:
The phenomenon of this study is interesting and exciting.
Weaknesses:
The underlying mechanisms of this phenomenon are not convincing.
-
Reviewer #2 (Public Review):
Summary:
In this study, the authors show that leaf exposure to leafhopper males is required for female attraction in the SAP54-expressing plant. They clarify how SAP54, by degrading SVP, suppresses biotic stress response pathways in leaves exposed to the males, thus facilitating female attraction and plant colonization.
Strengths:
This study suggests the possibility that the attraction of insect vectors to leaves is the major function of SAP54, and the induction of the leaf-like flowers may be a side-effect of the degradation of MTFs and SVP. It is a very surprising discovery that only male insect vectors can effectively suppress the plant's biotic stress response pathway. Although there has been interest in the phyllody symptoms induced by SAP54, the purpose, and advantage of secreting SAP54 were unknown. The results of this study shed light on the significance of secreted proteins in the phytoplasma life cycle and should be highly evaluated.
Weaknesses:
One weakness of this study is that the mechanisms by which male and female leafhoppers differentially affect plant defense responses remain unclear, although I understand that this is a future study.
The authors show that female feeding suppresses female colonization on SAP54-expressing plants. This is also an intriguing phenomenon but this study doesn't explain its molecular mechanism (Figure 7).
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This important body of work uses state-of-the-art quantitative methods to characterize and compare behaviors across five different fish species to understand which features are conserved and which ones are differentiated. The results from this study will potentially be of interest to ethologists and also have potential utility in understanding the neural mechanisms leading to these behaviors. While some claims are supported with compelling evidence, there are a few results that need further justification or qualification.
-
Reviewer #1 (Public Review):
Summary:
The authors used video tracking of 4 larval cichlid species and medaka to quantify prey-capture behaviors.
Strengths:
Comparing these behaviors is in principle an interesting question, and helps to address the typicality of the much better-understood zebrafish model. The authors make a good effort to analyze their data quantitatively.
Weaknesses:
(1) The overall conclusion, as summarized in the abstract as "Together, our study documents the diversification of locomotor and oculomotor adaptations among hunting teleost larvae" is not that compelling. What would be much more interesting would be to directly relate these differences to different ecological niches (e.g. different types of natural prey, visual scene conditions, height in water column etc), and/or differences in neural circuit mechanisms. While I appreciate that this paper provides a first step on this path, by itself it seems on the verge of stamp collecting, i.e. collecting and cataloging observations without a clear, overarching hypothesis or theoretical framework.
(2) The data to support some of the claims is either weak or lacking entirely.
-
Reviewer #2 (Public Review):
Summary:
This is a fascinating study about the behavioral kinematics of prey capture in larvae of several fish species (zebrafish, four cichlid species, and medaka). The authors describe in great detail swimming kinematics, hunting movement, eye movement as well as prey capture kinematics across these species. One striking finding is that cichlids and zebrafish use binocular vision to hunt for prey whereas medaka uses a monocular hunting style with a sideways motion to capture prey. The behavioral variation described in this study forms a strong foundation for future studies on the mechanisms underlying variation in hunting styles.
Strengths:
In general, the paper is well-written and documents very interesting data. The authors used sophisticated analyses that help appreciate the complexity of the behaviors examined. The discussion attempts to place the paper in a broader, comparative context. Overall, this paper reveals novel insight into an important behavior across different teleost species and lays a foundation for future studies on the neural and genetic basis of these distinct swimming and hunting behaviors.
Weaknesses:
The paper is rather descriptive in nature, although more context is provided in the discussion. Most figures are great, but I think the authors could add a couple of visual aids in certain places to explain how certain components were measured.
-
Reviewer #3 (Public Review):
Summary:
This paper uses 2D pose estimation and quantitative behavioral analyses to compare patterns of prey capture behavior used by six species of freshwater larval fish, including zebrafish, medaka, and four cichlids. The convincing comparison of tail and eye kinematics during hunts reveals that cichlids and zebrafish use binocular vision and similar hunting strategies, but that cichlids make use of an expanded set of action types. The authors also provide convincing evidence that medaka instead use monocular vision during hunts. This finding has important implications for the evolution of distinct distance estimation algorithms used by larval teleost fish species during prey capture.
Strengths:
The quality of the behavioral data is solid and the high frame rate allowed for careful quantification and comparison of eye and tail dynamics during hunts. The statistical approach to assess eye vergence states (Figure 2B) is elegant, the cross-species comparison of prey location throughout each hunt phase is well done (Figure 3B-D), and the demonstration that swim bout tail kinematics from diverse species can be embedded in a shared "canonical" principal component space to explain most of the variance in 2D postural dynamics for each species (Figure 4A-C) provides a simple and powerful framework for future studies of behavioral diversification across fish species.
Weaknesses:
More evidence is needed to assess the types of visual monocular depth cues used by medaka fish to estimate prey location, but that is beyond the scope of this compelling paper. For example, medaka may estimate depth through knowledge of expected prey size, accommodation, defocus blur, ocular parallax, and/or other possible algorithms to complement cues from motion parallax.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This is a valuable paper that identifies a potential challenge for embryos during fertilization: holding sperm contents in the fertilized embryos away from the oocyte meiotic spindle so that they don't get ejected into the polar body during meiotic chromosome segregation. The authors identify proteins involved in cytoplasmic streaming and maintaining the grouping of paternal organelles as being critical for this process. There remain minor weaknesses in the data presented but the paper provides solid evidence for the majority of its claims, and while the findings may pertain to a narrow audience the tools used and basic characterization shown will likely be relied upon by many in the community and therefore is of high value.
-
Joint Public Review:
Summary:
This paper by Beath et. al. identifies a potential regulatory role for proteins involved in cytoplasmic streaming and maintaining the grouping of paternal organelles: holding sperm contents in the fertilized embryos away from the oocyte meiotic spindle so that they don't get ejected into the polar body during meiotic chromosome segregation. The authors show that by time-lapse video, paternal mitochondria (used as a readout for sperm and its genome) is excluded from yolk granules and maternal mitochondria, even when moving long distances by cytoplasmic streaming. To understand how this exclusion is accomplished, they first show that it is independent of both internal packing and the engulfment of the paternal chromosomes by the maternal endoplasmic reticulum creating an impermeable barrier. They then test whether the control of cytoplasmic steaming affects this exclusion by knocking down two microtubule motors, Katanin and kinesis I. They find that the ER ring, which is used as a proxy for paternal chromosomes, undergoes extensive displacement with these treatments during anaphase I and interacts with the meiotic spindle, supporting their hypothesis that the exclusion of paternal chromosomes is regulated by cytoplasmic streaming. Next, they test whether a regulator of maternal ER organization, ATX-2, disrupts sperm organization so that they can combine the double depletion of ATX-2 and KLP-7, presumably because klp-7 RNAi (unlike mei-1 RNAi) does not affect polar body extrusion and they can report on what happens to paternal chromosomes. They find that the knockdown of both ATX-2 and KLP-7 produces a higher incidence of what appears to be the capture of paternal chromosomes by the meiotic spindle (5/24 vs 1/25). However, this capture event appears to halt the cell cycle, preventing the authors from directly observing whether this would result in the paternal chromosomes being ejected into the polar body.
The authors addressed the vast majority of the Reviewer's comments including the addition of new figures, re-wording of data interpretation and discussion points to better reflect the claims of the paper. There remain a few outstanding points which were not addressed.
In many cases the number of embryos analyzed or events capture remains low and the authors conclude that these sample sizes prevented statistical significance. It's not clear if more embryos were analyzed or if more capture would lead to statistical significance. Language capturing this caveat should also be included in the manuscript. A specific example of this is given below:
In the double knockdown of ATX-2 and KLP-7, there was no significant difference between single and double knockdowns and the ER ring displacement was not analyzed in this double mutant. Further, there was no difference in the frequency of sperm capture between single and double ATX-2 and KLP-7 due to low sample size, the the strength of the conclusion of this manuscript would be greatly improved if both of these results were further explored.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public Review):
Summary:
This paper by Beath et. al. identifies a potential regulatory role for proteins involved in cytoplasmic streaming and maintaining the grouping of paternal organelles: holding sperm contents in the fertilized embryos away from the oocyte meiotic spindle so that they don't get ejected into the polar body during meiotic chromosome segregation. The authors show that by time-lapse video, paternal mitochondria (used as a readout for sperm and its genome) is excluded from yolk granules and maternal mitochondria, even when moving long distances by cytoplasmic streaming. To understand how this exclusion is accomplished, they first show that it is independent of both internal packing and the engulfment of the paternal chromosomes by maternal endoplasmic reticulum creating an impermeable barrier. They then test whether the control of cytoplasmic steaming affects this exclusion by knocking down two microtubule motors, Katanin and kinesis I. They find that the ER ring, which is used as a proxy for paternal chromosomes, undergoes extensive displacement with these treatments during anaphase I and interacts with the meiotic spindle, supporting their hypothesis that the exclusion of paternal chromosomes is regulated by cytoplasmic streaming. Next, they test whether a regulator of maternal ER organization, ATX-2, disrupts sperm organization so that they can combine the double depletion of ATX-2 and KLP-7, presumably because klp-7 RNAi (unlike mei-1 RNAi) does not affect polar body extrusion and they can report on what happens to paternal chromosomes. They find that the knockdown of both ATX-2 and KLP-7 produces a higher incidence of what appears to be the capture of paternal chromosomes by the meiotic spindle (5/24 vs 1/25). However, this capture event appears to halt the cell cycle, preventing the authors from directly observing whether this would result in the paternal chromosomes being ejected into the polar body.
Strengths:
This is a useful, descriptive paper that highlights a potential challenge for embryos during fertilization: when fertilization results in the resumption of meiotic divisions, how are the paternal and maternal genomes kept apart so that the maternal genome can undergo chromosome segregation and polar body extrusion without endangering the paternal genome? In general, the experiments are well-executed and analyzed. In particular, the authors' use of multiple ways to knock down ATX-2 shows rigor.
Weaknesses:
The paper makes a case that this regulation may be important but the authors should do some additional work to make this case more convincing and accessible for those outside the field. In particular, some of the figures could include greater detail to support their conclusions, they could explain the rationale for some experiments better and they could perform some additional control experiments with their double depletion experiments to better support their interpretations. Also, the authors' inability to assess the functional biological consequences of the capture of the sperm genome by the oocyte spindle should be discussed, particularly in light of the cell cycle arrest that they observe.
These general comments are addressed in the more specific critiques below.
Reviewer #2 (Public Review):
Summary
In this manuscript, Beath et al. use primarily C. elegans zygotes to test the overarching hypothesis that cytoplasmic mechanisms exit to prevent interaction between paternal chromosomes and the meiotic spindle, which are present in a shared zygotic cytoplasm after fertilization. Previous work, much of which by this group, had characterized cytoplasmic streaming in the zygote and the behavior of paternal components shortly after fertilization, primarily the clustering of paternal mitochondria and membranous organelles around the paternal chromosomes. This work set out to identify the molecular mechanisms responsible for that clustering and test the specific hypothesis that the "paternal cloud" helps prevent the association of paternal chromosomes with the meiotic spindle.
Strengths
This work is a collection of technical achievements. The data are primarily 3- and 4-channel time-lapse images of zygotes shortly after fertilization, which were performed inside intact animals. There are many instances in which the experiments show extreme technical skill, such as tracking the paternal chromosomes over large displacements throughout the volume of the embryo. The authors employ a wide variety of fluorescent reporters to provide a remarkably clear picture of what is going on in the zygote. These reagents and the novel characterization of these stages that they provide will be widely beneficial to the community.
The data provide direct visualization of what had previously been a mostly hypothetical structure, the "paternal cloud," using simultaneous labeling of paternal DNA and mitochondria in combination with a variety of maternal proteins including maternal mitochondria, yolk granules, tubulin, and plasma membrane. Together, these images provided convincing evidence of the existence of this specified cytoplasmic domain. They go on to show that the knockdown of the ataxin-2 homolog ALX-2, a protein previously shown to affect ER dynamics, disrupted the paternal cloud, identifying a role for ER organization in this structure.
The authors then used the system to test the functional consequences of perturbing the cytoplasmic organization. Consistent with the paternal cloud being a stable structure, it stayed intact during large movements the authors generated using previously published knockdowns (of mei-1/katanin and kinesin-13/kpl-7) that increased cytoplasmic streaming. They used this data to document instances in which the paternal chromosomes were likely to have been attached to the spindle. They concluded with direct evidence of spindle fibers connecting to the paternal chromatin upon knockdown of ATX-2 in combination with increased cytoplasmic streaming, providing strong, direct support for their overarching hypothesis.
Weaknesses
While the data is convincing, the narrative of the paper could be streamlined to highlight the novelty of the experiments and better articulate the aims. For example, the cloud of paternal mitochondria and membranous organelles was previously shown, but Figures 1-2 largely reiterate that observation. The innovation seems to be that the combination of ER, yolk, and maternal mitochondrial markers makes the existence of a specified domain more concrete. There are also some instances where more description is needed to make the conclusions from the images clear.
These general comments are addressed in the more specific critiques below.
The manuscript intersperses what read like basic characterizations of fluorescent markers that, as written, can distract from the main story. The authors characterized the dynamics of ER organization throughout the substages of meiosis and the permeability of the envelope of ER that surrounds the paternal chromatin, but it could be more clearly established how the ability to visualize these structures allowed them to address their aims.
We have added the following after the initial description of ER morphology changes: (ER morphology was used to determine cell-cycle stages during live imaging reported below in Fig. 6.)
More background on what was previously known about ER organization in M-phase and the role of ataxin proteins specifically may help provide more continuity.
We have added references to transitions to ER sheets during mitotic M-phase in HeLa cells and Xenopus extracts.
Reviewer #3 (Public Review):
Summary:
This study by Beath et al. investigated the mechanisms by which sperm DNA is excluded from the meiotic spindle after fertilization. Time-lapse imaging revealed that sperm DNA is surrounded by paternal mitochondria and maternal ER that is permeable to proteins. By increasing cytoplasmic streaming using kinesin-13 or katanin RNAi, the authors demonstrated that limiting cytoplasmic streaming in the embryo is an important step that prevents the capture of sperm DNA by the oocyte meiotic spindle. Further experiments showed that the Ataxin-2 protein is required to hold paternal mitochondria together and close to the sperm DNA. Finally, double depletion of kinesin-13 and Ataxin-2 suggested an increased risk of meiotic spindle capture of sperm DNA.
Overall, this is an interesting finding that could provide a new understanding of how meiotic spindle capture of sperm DNA and its accidental expulsion into the polar body is prevented. However, some conceptual gaps need to be addressed and further experiments and improved data analyses would strengthen the paper.
- It would be helpful if the authors could discuss in good detail how they think maternal ER surrounds the sperm DNA
We have added 2 references to papers about nuclear envelope re-assembly from Shirin Bahmanyar’s lab and suggest the ER envelope is a halted intermediate in nuclear envelope reassembly.
and why is it not disrupted following Ataxin disruption.
We have been attempting to disrupt ER structures in the meiotic embryo for the last 5 years by depleting profilin, BiP, atlastin, ATX-2 and by optogenetically packing ER into a ball in the middle of the oocyte. None of these treatments prevent envelopment of the sperm DNA by maternal ER. None of these treatments remove ER from the spindle envelope and none remove ER from the plasma membrane. These treatments mostly result in “large aggregates” of ER that we have not examined by EM. Wild speculation: any disruption of the ER strong enough to prevent ER envelopment around chromatin would be sterile because the M to S transition in the mitotic zone of the germline would be blocked. Rapid depletion of ATX-2 to the extent shown by rigorous data in this manuscript does not prevent ER envelopment around chromatin. We chose not to speculate about the reasons for this because we do not know why.
- Since important phenotypes revealed in RNAi experiments (e.g. kinesin-13 and ataxin-2 double depletion) are not very robust, the authors should consider toning down their conclusions and revising some of their section headings. I appreciate that they are upfront about some limitations, but they do nonetheless make strong concluding sentences.
We have changed the discussion of the klp-7 atx-2 double depletion to: “The capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos suggests that the integrity of the exclusion zone around the sperm DNA might insulate the sperm DNA from spindle microtubules. However, a much larger number of klp-7(RNAi) singly depleted and atx-2(degron) singly depleted time-lapse sequences are needed to rigorously support this idea. “
- The discussion section could be improved further to present the authors' findings in the larger context of current knowledge in the field.
We have expanded the discussion as suggested.
- The authors previously demonstrated that F-actin prevents meiotic spindle capture of sperm DNA in this system. However, the current manuscript does not discuss how the katanin, kinesin-13 and Ataxin-2 mechanisms could work together with previously established functions of F-actin in this process.
We have added pfn-1(RNAi) to the discussion section.
- How can the authors exclude off-target effects in their RNAi depletion experiments? Can kinesin-13, katanin, and Ataxin phenotypes be rescued for instance?
For ataxin-2 phenotypes, two completely independent controls for off target effects are shown. GFP(RNAi) on a strain with and endogenous ATX-2::GFP tag vs GFP(RNAi) on a strain with no tag on the ATX-2. ATX-2::AID with or without auxin. For kinesin-13 and katanin, we did not do a rigorous control for off-target effects of RNAi. However, the effects of these depletions on cytoplasmic microtubules have been previously reported by others
- How are the authors able to determine if the paternal genome was actually captured by the spindle? Does lack of movement definitively suggest capture without using a spindle marker?
mKate::tubulin labels the spindle in each capture event. This can be seen in Video S3. for mei-1(RNAi) and Figure 9 for atx-2 klp-7 double depletions.
(1) Major issues:
The images provided are not convincing that mitochondria are entirely excluded from the regions with yolk granules from the images provided. Please provide insets of magnified images of the paternal mitochondria in Figure 1E to more clearly show the exclusion even when paternal mitochondria are streaming. Providing grayscale images, individual z-sections and/or some quantification of this data might also be more convincing to this reviewer.
We have modified Fig. 1 by adding single wavelength magnified insets to more clearly show that paternal mitochondria are in a “black hole” in the maternal yolk granules during cytoplasmic streaming.
Figure 2 -This figure can be retitled to highlight that the paternal organelle cloud is impermeable to mitochondria and conserved.
The legend has been re-titled as suggested.
Figure 3B, An image of the DNA within the ring of maternal ER especially since the maternal ER ring is used as a proxy for the paternal chromosomes in later figures would strengthen the authors' claims.
We have added a panel showing DAPI-stained DNA in the center of the ER ring and paternal mitochondria cloud.
Why is the faster time scale imaging significant? I think this could be more clearly set up in the paper. Perhaps rapid imaging of maternal mito-labeled kca-1(RNAi) embryos would better show the difference in time scale, with the expectation that the paternal cloud forms and persists while the ER invades.
We are not sure what the reviewer means. 5 sec time intervals were used throughout the paper. We are also not sure how kca-1(RNAi) would help. Movement of the entire oocyte into and out of the spermatheca is what limits the ability to keep a fusing sperm in focus. kca-1(RNAi) would prevent cytoplasmic streaming but not ovulation movements.
Figure 4 - The question about the permeability of the ER envelope seems to come out of nowhere as written. It isn't clear how it contributes to the larger story about preventing sperm incorporation in the spindle.
This section of the results is introduced with: “If the maternal ER envelope around sperm DNA was sealed and impermeable during meiosis, this could both prevent the sperm DNA from inducing ectopic spindle assembly and prevent the sperm DNA from interacting with meiotic spindle microtubules.”
The data in Figure 4 would probably not be expected to be in this paper based on the paper title. Maybe the title needs something about ER dynamics? "eg. ATX-2 but not an ER envelope" isolates the paternal chromatin?
In Figure 5, it seems that RNAi of klp-7 and Mei-1 had slightly different effects on short-axis displacement of the ER envelope (klp-7 affecting it more dramatically than mei-1) and slightly different effects on interaction with the meiotic spindle (capture vs streaming past the spindle). The authors mention in their discussion that the difference in the interaction with the meiotic spindle might reflect the effects that loss of Mei-1 may have on the spindle but could it also be a consequence of the differences in cytoplasmic streaming observed?
With our current data, the only statistically significant difference between cytoplasmic streaming of the sperm contents in mei-1(RNAi) vs klp-7(RNAi) is that excessive streaming persists longer into metaphase II in klp-7(RNAi). We have added a sentence describing this difference to the results. If differences in streaming were the cause of different capture frequencies, then klp-7(RNAi) would cause more capture events than mei-1(RNAi) but the opposite was observed. We have avoided too much discussion here because the frequency of capture events is too low to demonstrate statistically significant differences between mei-1(RNAi), klp-7(RNAi), and atx-2(degron) + klp-7(RNAi) without a very large increase in the number of time-lapse sequences.
Also, the authors should find a way to represent this interaction with the meiotic spindle in a quantitative or table form to allow the reader to observe some of the patterns they report more easily.
We have added a table to Fig. 9 that summarizes capture data.
Finally, can the authors report when they observe the closest association with the meiotic spindle: Does it correlate with the period of greatest displacement (AI) or are they unlinked?
The low frequency of capture events makes it difficult to test this rigorously.
Figure 6- 'Endogenously tagged ATX-2 was observed throughout oocytes and meiotic embryos without partial co-localization with ER.' How can the authors exclude co-localization with ER?
We have changed the wording to: “Endogenously tagged ATX-2 was observed throughout oocytes and meiotic embryos (Fig. 6A; Fig. S2). ATX-2 did not uniquely co-localize with ER (Fig. S2).“
The rationale for why the authors think that the integrity of sperm organelles is important to keep the genomes apart is not clear to this reviewer and needs to be explained better. Moving the discussion of the displacement experiments in Figure S3 from the end of the results section to the ATX-2 knockdown section would help accomplish this.
We have added the sentence: “The frequency of sperm capture by the meiotic spindle (Fig. 9D) was significantly higher than wild-type controls in klp-7(RNAi) atx-2(AID) double depleted embryos (p=0.011 Fisher’s exact test). Although the number of single mutant embryos analyzed was too low to demonstrate a significant difference between single and double mutant embryos, these results qualitatively support the hypothesis that limiting cytoplasmic streaming and maintaining the integrity of the ball of paternal mitochondria are both important for preventing capture events between the meiotic spindle and sperm DNA.”
It looks like, in the double knockdown of ATX-2 and KLP-7, the spread of paternal mitochondria is less affected than when only ATX-2 is depleted. What effect does this result have on the observation that the incidence of sperm capture appears to increase in the double depletion? What does displacement of the ER ring look like in the double depletion? Is it additive, consistent with their interpretation that both limiting cytoplasmic streaming and maintaining the integrity of the ball of paternal mitochondria is required to keep the genomes separate?
We cannot show a significant difference between single a double knockdowns without increasing n by alot. We did not analyze ER ring displacement in the double mutant.
Is the increased incidence of capture in the double-depleted embryos significant?
We have added the sentence: “The frequency of sperm capture by the meiotic spindle (Fig. 9D) was significantly higher than wild-type controls in klp-7(RNAi) atx-2(AID) double depleted embryos (p=0.011 Fisher’s exact test). Although the number of single mutant embryos analyzed was too low to demonstrate a significant difference between single and double mutant embryos, these results qualitatively support the hypothesis that limiting cytoplasmic streaming and maintaining the integrity of the ball of paternal mitochondria are both important for preventing capture events between the meiotic spindle and sperm DNA.”
What do the authors make of the cell cycle arrest observed when paternal chromosomes are captured? Is there an argument to be made that this arrest supports the idea that preventing this capture is actively regulated and therefore functionally important?
We chose not to discuss the mechanism of this arrest because considerably more work would be required to prove that it is not caused by a combination of imaging conditions and genotype. The low frequency of these capture + arrest events would make it very difficult to show that the arrest does not occur after depleting a checkpoint protein.
(2) Minor concerns:
Top of page 4: "streaming because depletion tubulin stops cytoplasmic streaming (7)" should be "streaming because depletion of tubulin stops cytoplasmic streaming (7)"
The ”of” has been inserted.
Page 6: "This result indicated that the volume of paternal mitochondria excludes maternal mitochondria and yolk granules but not maternal ER." The authors have only shown this for maternal mitochondria, not yolk granules.
We have deleted the mention of yolk granules here.
Page 7: "These results suggest that all maternal membranes are initially excluded from the sperm at fusion." Should be "These results show that maternal ER are initially excluded from the sperm at fusion. Since maternal mitochondria and yolk granules are excluded later, this suggests that all maternal membranes are initially excluded from the sperm at fusion."
We have changed this sentence as suggested.
It's not clear why the authors show other types of movement that might be quantified when cytoplasmic streaming is affected in Figure 5A and only quantify long-axis and short-axis displacement.
We have deleted the other types of movement from the schematic. Although these parameters were quantified, we did not include this data in the results so it would be confusing for the reader to have them in the schematic.
Bottom of page 7: Mention that the GFP::BAF-1 was maternally provided.
We have added “Maternally provided..”
Missing an Arrow on Figure 1A 9:20.
We removed the text citation to an arrow in Fig. 1A because we moved most of the description of the ER ring to Fig. 3 to address other reviewer suggestions.
Supplemental videos should be labeled appropriately to indicate what structures are labeled. It is currently difficult to understand what is being shown.
(3) Issues with the Discussion section:
"The simplest explanation is that cytoplasm does not mix during the 45 min from GVBD to pronucleus formation due to the high viscosity of cytoplasm." - Citation page 12.
We have changed the sentence to: “The simplest hypothesis is that maternal and paternal cytoplasm might not mix during the 45 min from GVBD to pronucleus formation due to the high viscosity of cytoplasm.”
"The higher frequency of capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos compared with either single depletion suggests that the integrity of the exclusion zone around the sperm DNA may insulate the sperm DNA from spindle microtubule" - Pages 12-13 reference the figures.
This sentence has been rewritten in response to other comments but the new sentence now references revised Fig. 9.
"ATX-2 is required to maintain the integrity of the ball of paternal mitochondria around the sperm DNA, but the mechanism is unknown." - Page 13 reference figure.
A reference to Figs 7 and 8 has been inserted.
" In control embryos, the sperm contents rarely came near the meiotic spindle in agreement with a previous study that found that male and female pronuclei rarely form next to each other (6). Streaming of the sperm contents was most commonly restricted to a jostling motion with little net displacement, circular streaming in the short axis of the embryo, or long axis streaming in which the sperm turned away from the spindle before the halfway point of the embryo. Depletion of MEI-1 or KLP-7 resulted in longer excursions of the sperm contents in the long axis of the embryo toward the spindle but frequent capture of the sperm by the spindle was only observed in mei-1(RNAi)." - Page 13, the corresponding figures need to be referenced for these sentences.
We have inserted figure references.
"In capture events observed after double depletion of ATX-2 and KLP-7, a bundle of microtubules was discernible extending from the spindle into the ER envelope surrounding the sperm DNA. Such bundles were not observed in mei-1(RNAi) capture events, likely because of the previously reported low density of microtubules in mei-1(RNAi) spindles (36, 37)." - Pages 13-14 references figures here.
We have inserted figure references.
"The higher frequency of capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos compared with either single depletion suggests that the integrity of the exclusion zone around the sperm DNA may insulate the sperm DNA from spindle microtubules." - This should be toned down since this phenotype is not robust.
We have changed this to: “The capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos suggests that the integrity of the exclusion zone around the sperm DNA might insulate the sperm DNA from spindle microtubules. However, a much larger number of klp-7(RNAi) singly depleted and atx-2(degron) singly depleted time-lapse sequences are needed to rigorously support this idea. “
ATX-2 depletion alters ER morphology but does not impact the maternal ER envelope - could the authors provide a potential explanation for this?
In the discussion, we cite papers showing that ATX-2 depletion affects many different cellular processes so the effect we see on paternal mitochondria might have nothing to do with the ER ring. We have been attempting to disrupt ER structures in the meiotic embryo for the last 5 years by depleting profilin, BiP, atlastin, ATX-2 and by optogenetically packing ER into a ball in the middle of the oocyte. None of these treatments prevent envelopment of the sperm DNA by maternal ER. None of these treatments remove ER from the spindle envelope and none remove ER from the plasma membrane. These treatments mostly result in “large aggregates” of ER that we have not examined by EM. Wild speculation: any disruption of the ER strong enough to prevent ER envelopment around chromatin would be sterile because the M to S transition in the mitotic zone of the germline would be blocked. Rapid depletion of ATX-2 to the extent shown by rigorous data in this manuscript does not prevent ER envelopment around chromatin. We chose not to speculate about the reasons for this because we do not know why.
It would be good to have representative images of what the altered spindle looks like in MEI-1-depleted oocytes.
The structure of MEI-1-depleted spindles has been described in the cited references.
"Depletion of MEI-1 or KLP-7 resulted in longer excursions of the sperm contents in the long axis of the embryo toward the spindle but frequent capture of the sperm by the spindle was only observed in mei-1(RNAi)" - It is intriguing that this does not happen in the double depletion experiments of kinesin-13 and ATX-2. The authors should perhaps discuss this.
This does happen in KLP-7 ATX-2 double depleted embryos as shown in Fig. 9.
(4) Missing citations:
"This analysis was restricted to embryos from anaphase I through anaphase II because our streaming data and that of Kimura 2020 indicate that the sperm contents have not moved significantly before anaphase I." - This needs an appropriate citation. Page 10.
We have inserted citations here.
" The simplest explanation is that cytoplasm does not mix during the 45 min from GVBD to pronucleus formation due to the high viscosity of cytoplasm." - Citation page 12. Not referencing figures in the discussion.
We have changed the sentence to: “The simplest hypothesis is that maternal and paternal cytoplasm might not mix during the 45 min from GVBD to pronucleus formation due to the high viscosity of cytoplasm.”
"The higher frequency of capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos compared with either single depletion suggests that the integrity of the exclusion zone around the sperm DNA may insulate the sperm DNA from spindle microtubule" - Pages 12-13 reference the figures.
A reference to the revised Fig. 9 has been inserted in the revised version of this sentence.
"ATX-2 is required to maintain the integrity of the ball of paternal mitochondria around the sperm DNA, but the mechanism is unknown."
References to Figs. 7 and 8 have been inserted.
Page 13 reference figure
" In control embryos, the sperm contents rarely came near the meiotic spindle in agreement with a previous study that found that male and female pronuclei rarely form next to each other (6). Streaming of the sperm contents was most commonly restricted to a jostling motion with little net displacement, circular streaming in the short axis of the embryo, or long axis streaming in which the sperm turned away from the spindle before the halfway point of the embryo. Depletion of MEI-1 or KLP-7 resulted in longer excursions of the sperm contents in the long axis of the embryo toward the spindle but frequent capture of the sperm by the spindle was only observed in mei-1(RNAi)." Page 13, the corresponding figures need to be referenced for these sentences.
We have inserted citations here.
"In capture events observed after double depletion of ATX-2 and KLP-7, a bundle of microtubules was discernible extending from the spindle into the ER envelope surrounding the sperm DNA. Such bundles were not observed in mei-1(RNAi) capture events, likely because of the previously reported low density of microtubules in mei-1(RNAi) spindles (36, 37)." Pages 13-14 references figures here.
We have inserted citations here.
(5) Referencing wrong figures in the text:
Figure 5 - In the figure legend there is a 5C but there is no 5C panel in the figure.
A C has been inserted in Fig. 5.
Figure 6A - "Dark holes were observed suggesting exclusion from the lumens of larger membranous organelles (Fig. 6A; Fig. S2)." Page 10.
6A has been changed to 6C.
Figure 6A is showing background autofluorescence in WT oocytes so I am not certain why it is cited here.
The Figure citation has been corrected to 6B, C.
Figure 8 - I could not find the supplemental data file with the individual mitochondria distance measurements.
We are including the Excel file with the revised submission.
The last sentence of the first paragraph should be re-worded to be more concise ". In C. elegans, the nucleus is positioned away from the site of future fertilization so that the meiosis I spindle assembles at the opposite end of the ellipsoid zygote from the site of fertilization (2-4). "
Every word of this sentence is important.
Last sentence second paragraph typo "These microtubules are thought to drive meiotic cytoplasmic streaming because depletion tubulin stops cytoplasmic streaming (7) and depletion of the microtubule-severing protein katanin by RNAi results in an increased mass of cortical microtubules and an increase in cytoplasmic streaming (8)." Pages 3-4.
“of” has been inserted.
(6) Typos in the introduction should be corrected:
Ataxin or kinesin-13 are not mentioned in the introduction but these are a big focus of the paper.
Gong et al 2024 written instead of number citation (page 5), no citation in References.
This has been corrected.
Supplemental videos should be labeled appropriately to indicate what structures are labeled. It is currently difficult to understand what is being shown.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This fundamental study provides a near-comprehensive anatomical description and annotation of neurons in a male Drosophila ventral nerve cord, based on large-scale circuit reconstruction from electron microscopy. This connectome resource will be of substantial interest to neuroscientists interested in sensorimotor control, neural development, and analysis of brain connectivity. However, although the evidence is extensive and compelling, the presentation of results in this very large manuscript lacks clarity and concision.
-
Reviewer #1 (Public Review):
Summary: The authors present a close to complete annotation of the male Drosophila ventral nerve cord, a critical part of the fly's central nervous system.
Strengths: The manuscript describes an enormous amount of work that takes the first steps towards presenting and comprehending the complexity and organization of the ventral nerve cord. The analysis is thorough and complete. It also makes the effort to connect this EM-centric view of the nervous system to more classical analyses, such as the previously defined hemilineages, that also describe the organization of the fly nervous system. There are many, many insights that come from this work that will be valuable to the field for the foreseeable future.
Weaknesses: With more than 60 primary figures, the paper is overwhelming and cannot be read and digested in a single sitting. The result is more like a detailed resource rather than a typical research paper.
-
Reviewer #2 (Public Review):
Summary and strengths:<br /> This massive paper describes the identity and connectivity of neurons reconstructed from a volumetric EM image volume of the ventral nerve cord (VNC) of a male fruit fly. The segmentation of the EM data was described in one companion paper; the classification of the neurons entering the VNC from the brain (descending neurons or DNs) and the motor neurons leaving the VNC was described in a second companion paper. Here, the authors describe a system for annotating the remaining neurons in the VNC, which include intrinsic neurons, ascending neurons, and sensory neurons, representing the vast majority of neurons in the dataset. Another fundamental contribution of this paper is the identification of the developmental origins (hemilineage) of each intrinsic neuron in the VNC. These comprehensive hemilineage annotations can be used to understand the relationship between development and circuit structure, provide insight into neurotransmitter identity, and facilitate comparisons across insect species.Many sensory neurons are also annotated by comparison to past literature. Overall, defining and applying this annotation system provides the field with a standard nomenclature and resource for future studies of VNC anatomy, connectivity, and development. This is a monumental effort that will fundamentally transform the field of Drosophila neuroscience and provide a roadmap for similar connectomic studies in other organisms.
Weaknesses:<br /> Despite the significant merit of these contributions, the manuscript is challenging to read and comprehend. In some places, it seems to be attempting to comprehensively document everything the authors found in this immense dataset. In other places, there are gaps in scholarship and analysis. As it is currently constructed, I worry that the manuscript will intimidate general readers looking for an entry point to the system, and ostracize specialized readers who are unable to use the paper as a comprehensive reference due to its confusing organization.
The bulk of the 559 pages of the submitted paper is taken up by a set of dashboard figures for each of ~40 hemilineages. Formatting the paper as an eLife publication will certainly help condense these supplemental figures into a more manageable format, but 68 primary figures will remain, and many of these also lack quality and clarity. Without articulating a clear function for each plot, it is hard to know what the authors missed or chose not to show. As an example, many of the axis labels indicate the hemilineage of a group of neurons, but are ordered haphazardly and so small as to be illegible; if the hemilineage name is too small, and in a bespoke order for that data, then is the reader meant to ignore the specific hemilineage labels?
The text has similar problems of emphasis. It is often meandering and repetitive. Overlapping information is found in multiple places, which causes the paper to be much longer than it needs to be. For example, the concept of hemilineages is introduced three times before the subtitle "Introduction to hemilineage-based organisation". When cell typing is introduced, it is unclear how this relates to serial motif, hemilineage, etc; "Secondary hemilineages" follow the Cell typing title. Like the overwhelming number of graphical elements, this gives the impression that little attention has been paid to curating and editing the text. It is unclear whether the authors intend for the paper to be read linearly or used as a reference. In addition, descriptions of the naming system are often followed by extensive caveats and exceptions, giving the impression that the system is not airtight and possibly fluid. At many points, the text vacillates between careful consideration of the dataset's limitations and overly grandiose claims. These presentation flaws overshadow the paper's fundamental contribution of describing a reasonable and useful cell-typing system and placing intrinsic neurons within this framework.
References to past Drosophila literature are inconsistent and references to work from other insects are generally not included; for example, the extensive past work on leg sensory neurons in locusts, cockroaches, and stick insects. Such omissions are understandable in a situation where brevity is paramount. However, this paper adopts a comprehensive and authoritative tone that gives the reader an impression of completeness that does not hold up under careful scrutiny.
The paper accompanies the release of the MANC dataset (EM images, segmentation, annotations) through a web browser-based tool: clio.janelia.org. The paper would be improved by distilling it down to its core elements, and then encouraging readers to explore the dataset through this interactive interface. Streamlining the paper by removing extraneous and incomplete analyses would provide the reader with a conceptual or practical framework on which to base their own queries of the connectome.
-
Author response:
eLife assessment
This fundamental study provides a near-comprehensive anatomical description and annotation of neurons in a male Drosophila ventral nerve cord, based on large-scale circuit reconstruction from electron microscopy. This connectome resource will be of substantial interest to neuroscientists interested in sensorimotor control, neural development, and analysis of brain connectivity. However, although the evidence is extensive and compelling, the presentation of results in this very large manuscript lacks clarity and concision.
We thank the reviewers for their detailed and thoughtful feedback and the time that they invested to provide it. Organising this manuscript (which is clearly not a standard research article) was quite challenging as it had to fulfil a number of functions: presenting a guide to the system of annotations and the associated online resources; providing an atlas for the annotated cell types; and showcasing various analyses to illustrate the value of the dataset as well as just a few of the many questions it can be used to address. We gave careful consideration to its structure and attempted to signpost the sections that would be most useful to particular types of readers. Nevertheless we can see that this was not completely successful and we thank the reviewers for their suggestions for improvement.
We acknowledge that the resulting manuscript was very large and will endeavour to streamline our text in the revision without compromising the accessibility of the data. We do note that there is some precedent for comprehensive and lengthy connectome papers going all the way back to White et al. 1986 which took 340 pages to describe the 302 neurons of the C. elegans connectome. More recently, we can compare the “hemibrain papers” published in eLife: Scheffer et al., 2020, Li et al., 2020, Schlegel et al., 2021, Hulse et al., 2021. These papers would also be difficult to digest at a single sitting but were game-changing for the Drosophila neuroscience field and have already been cited hundreds of times, a testament to their utility. In the same way that these papers provided the first comprehensively proofread and annotated EM connectome for (a large part of) the adult fly brain, our work now provides the first fully proofread and annotated EM connectome for the nerve cord. Given the pioneering nature of this dataset we feel that the lengthy but highly structured atlas sections of the paper are justified and will prove impactful in the long term.
Whilst no EM dataset is perfect, we have endeavoured to make this one as comprehensive as possible. We found 74.4 million postsynapses and 15,765 neurons of VNC origin, all of which have been carefully proofread, reviewed, annotated and typed. For comparison, the female adult nerve cord dataset (FANC, Azevedo et al., Nature, 2024) contains roughly 45 million synapses and 14,600 neuronal cell bodies of which at the time of writing 5576 have received preliminary proofreading and 222 high quality proofreading. We emphasise that these are highly complementary datasets, given the difference in sex and the fact that each dataset has different artefacts (MANC has poorer preservation of neurons in the leg nerves; FANC is missing part of the abdominal ganglion and has lower synapse recovery). We reconstructed 5484 sensory neurons from the thoracic nerves, 84% of the ~6500 estimated from FANC. The overall recovery rate was ~86.5% if we include the ~1100 sensory neurons from abdominal nerves, which were in excellent condition.
Reviewer #1 (Public Review):
Summary:
The authors present a close to complete annotation of the male Drosophila ventral nerve cord, a critical part of the fly's central nervous system.
Strengths:
The manuscript describes an enormous amount of work that takes the first steps towards presenting and comprehending the complexity and organization of the ventral nerve cord. The analysis is thorough and complete. It also makes the effort to connect this EM-centric view of the nervous system to more classical analyses, such as the previously defined hemilineages, that also describe the organization of the fly nervous system. There are many, many insights that come from this work that will be valuable to the field for the foreseeable future.
We thank the reviewer for acknowledging the enormous collaborative effort represented by this manuscript. We tried to synthesise decades of light-level work by neuroscientists and developmental biologists working in Drosophila and other insects in order to create a standard, systematic nomenclature for >22,000 neurons, most of which had not been typed at light level. We hope that the MANC dataset and this guide to its contents will prove to be useful resources to Drosophila neurobiologists and the wider neuroscience field.
Weaknesses:
With more than 60 primary figures, the paper is overwhelming and cannot be read and digested in a single sitting. The result is more like a detailed resource rather than a typical research paper.
In writing this paper, we had two aims: first, to describe and validate our extensive biological annotation of the connectome and second, to provide interesting illustrative examples of the many analyses that could be carried out on this dataset using the atlas we generated. The resulting paper is intended primarily as a detailed reference rather than a typical research paper. At the end of the Introduction, we outline the structure of the paper and explicitly direct non-specialist readers to focus on the initial and concluding sections for orientation to the dataset so that they would not get bogged down in the details. We will review our section organisation and headings to try to make the paper more straightforward to navigate, and we will add specific figure numbers to the outline.
Reviewer #2 (Public Review):
Summary and strengths:
This massive paper describes the identity and connectivity of neurons reconstructed from a volumetric EM image volume of the ventral nerve cord (VNC) of a male fruit fly. The segmentation of the EM data was described in one companion paper; the classification of the neurons entering the VNC from the brain (descending neurons or DNs) and the motor neurons leaving the VNC was described in a second companion paper. Here, the authors describe a system for annotating the remaining neurons in the VNC, which include intrinsic neurons, ascending neurons, and sensory neurons, representing the vast majority of neurons in the dataset. Another fundamental contribution of this paper is the identification of the developmental origins (hemilineage) of each intrinsic neuron in the VNC. These comprehensive hemilineage annotations can be used to understand the relationship between development and circuit structure, provide insight into neurotransmitter identity, and facilitate comparisons across insect species.Many sensory neurons are also annotated by comparison to past literature. Overall, defining and applying this annotation system provides the field with a standard nomenclature and resource for future studies of VNC anatomy, connectivity, and development. This is a monumental effort that will fundamentally transform the field of Drosophila neuroscience and provide a roadmap for similar connectomic studies in other organisms.
We thank the reviewer for acknowledging the enormous collaborative effort represented by this manuscript. We tried to synthesise decades of light-level work by neuroscientists and developmental biologists working in Drosophila and other insects in order to create a standard, systematic nomenclature for >22,000 neurons, most of which had not been typed at light level. We hope that the MANC dataset and this guide to its contents will prove to be useful resources to Drosophila neurobiologists and the wider neuroscience field.
Weaknesses:
Despite the significant merit of these contributions, the manuscript is challenging to read and comprehend. In some places, it seems to be attempting to comprehensively document everything the authors found in this immense dataset. In other places, there are gaps in scholarship and analysis. As it is currently constructed, I worry that the manuscript will intimidate general readers looking for an entry point to the system, and ostracize specialized readers who are unable to use the paper as a comprehensive reference due to its confusing organization.
In writing this paper, we had two aims: first, to describe and validate our extensive biological annotation of the connectome and second, to provide interesting illustrative examples of the many analyses that could be carried out on this dataset using the atlas we generated. The resulting paper is intended primarily as a detailed reference rather than a typical research paper. At the end of the Introduction, we outline the structure of the paper and explicitly direct non-specialist readers to focus on the initial and concluding sections for orientation to the dataset so that they would not get bogged down in the details. We will review our section organisation and headings to try to make the paper more straightforward to navigate, and we will add specific figure numbers to the outline.
The bulk of the 559 pages of the submitted paper is taken up by a set of dashboard figures for each of ~40 hemilineages. Formatting the paper as an eLife publication will certainly help condense these supplemental figures into a more manageable format, but 68 primary figures will remain, and many of these also lack quality and clarity. Without articulating a clear function for each plot, it is hard to know what the authors missed or chose not to show. As an example, many of the axis labels indicate the hemilineage of a group of neurons, but are ordered haphazardly and so small as to be illegible; if the hemilineage name is too small, and in a bespoke order for that data, then is the reader meant to ignore the specific hemilineage labels?
We will contact eLife professional editing staff to determine whether the paper can be streamlined by moving more material to supplemental without making it difficult to locate the detailed catalogues of neurons that will be of interest to specialist readers. Based on the typical eLife format, we suspect that retaining the dashboard main figures for each hemilineage will be necessary to maintain its utility as a reference. We will, however, shorten the associated main text by, for example, moving background material used to assign the hemilineages to the Methods section and moving specific results to the figure legends where possible.
We articulated the function for each plot as follows: "Below we describe in more depth every hemilineage that produces more than one or two secondary neurons. For each of these 35 hemilineages, we show (A) the overall morphology of the secondary population, (B) representative individual neurons (as estimated by highest average NBLAST score to other members of the hemilineage), and (C) specific notable examples (which in some cases are primary). We then report (D) the locations of their connectors (postsynapses and presynapses), (E) their upstream and downstream partners by class, and (F) their upstream and downstream partners by finer subdivisions corresponding to their systematic types (secondary hemilineage, target, or sensory modality). We also provide supplementary figures showing the morphology and normalised up- and downstream connectivity of all systematic types for each hemilineage."
We have plotted every secondary neuron in each hemilineage, every predicted synapse for those neurons with confidence >0.5, every connection to partner neurons by class (no threshold applied), and then the same information organised by hemilineage in a heatmap (and including partners from all birthtimes and partners of unknown hemilineage). Then the supplementary figures show all connectivity, organised in the same way, for every individual cell type assigned to the hemilineage, including both primary and early secondary neurons. We will add more detail to the figure legends to clarify these points.
We apologise that you were unable to read some of the axis labels in the review copy of the manuscript; we did submit high resolution versions of the figures as a supplemental file, but perhaps this did not reach you; they can also be found at https://www.biorxiv.org/content/10.1101/2023.06.05.543407v2.supplementary-material. The hemilineages are in a conserved (alphanumerical) order for all hemilineage-specific plots and many others. The exceptions arise when neurons are clustered based on their connectivity to hemilineages, in which case the order of the labels necessarily follows the structure of the resulting clusters.
The text has similar problems of emphasis. It is often meandering and repetitive. Overlapping information is found in multiple places, which causes the paper to be much longer than it needs to be. For example, the concept of hemilineages is introduced three times before the subtitle "Introduction to hemilineage-based organisation". When cell typing is introduced, it is unclear how this relates to serial motif, hemilineage, etc; "Secondary hemilineages" follow the Cell typing title. Like the overwhelming number of graphical elements, this gives the impression that little attention has been paid to curating and editing the text. It is unclear whether the authors intend for the paper to be read linearly or used as a reference. In addition, descriptions of the naming system are often followed by extensive caveats and exceptions, giving the impression that the system is not airtight and possibly fluid. At many points, the text vacillates between careful consideration of the dataset's limitations and overly grandiose claims. These presentation flaws overshadow the paper's fundamental contribution of describing a reasonable and useful cell-typing system and placing intrinsic neurons within this framework.
Because we intended this paper to be read primarily as a reference, we tried to make each section stand on its own, which we agree resulted in some redundancy (with more details appearing where relevant). However, we will do our best to tighten the text for the version of record.
Our description immediately under the Cell typing title includes the use of hemilineage, serial (not serial motif, which was not used), and laterality (left-right homologues) in the procedure to assign cell types. We will change this to “Cell typing of intrinsic, ascending, and efferent neurons” for clarity. The “Secondary hemilineages” title marks the start of a new section that serves as a reference for each of the secondary hemilineages; we will change this to “Secondary hemilineage catalogue” or similar for clarity.
References to past Drosophila literature are inconsistent and references to work from other insects are generally not included; for example, the extensive past work on leg sensory neurons in locusts, cockroaches, and stick insects. Such omissions are understandable in a situation where brevity is paramount. However, this paper adopts a comprehensive and authoritative tone that gives the reader an impression of completeness that does not hold up under careful scrutiny.
We did not attempt to review the sensory neuron literature in this manuscript but rather cited those specific papers which included the axon morphology data that informed our modality, peripheral origin, and cell type assignments. Most of these came from the Drosophila literature due to the availability of genetic tools used for sparse labelling of specific populations as well as the greatly increased likelihood of conserved morphology. However we certainly agree that decades of sensory neuron work in larger insects were foundational for this subfield and will add a sentence to this effect in the introduction to our sensory neuron typing.
The paper accompanies the release of the MANC dataset (EM images, segmentation, annotations) through a web browser-based tool: clio.janelia.org. The paper would be improved by distilling it down to its core elements, and then encouraging readers to explore the dataset through this interactive interface. Streamlining the paper by removing extraneous and incomplete analyses would provide the reader with a conceptual or practical framework on which to base their own queries of the connectome.
We certainly hope that this paper will encourage readers to explore the MANC dataset. Indeed, as we state in the Discussion, "Moreover, its ultimate utility depends on how widely it is leveraged in the future experimental and computational work of the entire neuroscience community. We have only revealed the tip of the iceberg in this report, with a wealth of opportunities now available in this publicly available dataset for forthcoming connectomic analyses that will feed into testable functional hypotheses." In the first few sections of the Results, we include a visual introduction to annotated features, a glossary of annotation terms, a visual guide to our cell typing nomenclature, and two video tutorials on the use of Clio Neuroglancer to query the dataset. To further encourage exploration, we have also included illustrative examples of just a few of the many analyses that can now be performed with this comprehensive and publicly available dataset.
-
-
www.biorxiv.org www.biorxiv.org
-
Reviewer #2 (Public Review):
Summary:<br /> The authors used four datasets spanning 30 countries to examine funding success and research quality score for various disciplines. They examined whether funding or research quality score were influenced by majority gender of the discipline and whether these affected men, women, or both within each discipline. They found that disciplines dominated by women have lower funding success and research quality score than disciplines dominated by men. These findings are surprising because even the men in women-dominated fields experienced lower funding success and research quality score.
Strengths:<br /> - The authors utilized a comprehensive dataset covering 30 countries to explore the influence of the majority gender in academic disciplines on funding success and research quality scores.<br /> - Findings suggest a systemic issue where disciplines with a higher proportion of women have lower evaluations and funding success for all researchers, regardless of gender.<br /> - The manuscript is notable for its large sample size and the diverse international scope, enhancing the generalizability of the results.<br /> - The work accounts for various factors including age, number of research outputs, and bibliometric measures, strengthening the validity of the findings.<br /> - The manuscript raises important questions about unconscious bias in research evaluation and funding decisions, as evidenced by lower scores in women-dominated fields even for researchers that are men.<br /> - The study provides a nuanced view of gender bias, showing that it is not limited to individuals but extends to entire disciplines, impacting the perception and funding and quality or worth of research.<br /> - This work underscores the need to explore motivations behind gender distribution across fields, hinting at deep-rooted societal and institutional barriers.<br /> - The authors have opened a discussion on potential solutions to counter bias, like adjusting funding paylines or anonymizing applications, or other practical solutions.<br /> - While pointing out limitations such as the absence of data from major research-producing countries, the manuscript paves the way for future studies to examine whether its findings are universally applicable.<br /> - The study carefully uses the existing data (including PBRF funding panel gender diversity) to examine gender bias. These types of datasets are often not readily accessible for analysis. Here, the authors have used the available data to the fullest extent possible.
The authors have addressed the concerns I had about the original version.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #2 (Public Review):
Summary:
The authors used four datasets spanning 30 countries to examine funding success and research quality score for various disciplines. They examined whether funding or research quality score were influenced by majority gender of the discipline and whether these affected men, women, or both within each discipline. They found that disciplines dominated by women have lower funding success and research quality score than disciplines dominated by men. These findings, are surprising because even the men in women-dominated fields experienced lower funding success and research quality score.
Strengths:
- The authors utilized a comprehensive dataset covering 30 countries to explore the influence of the majority gender in academic disciplines on funding success and research quality scores.
- Findings suggest a systemic issue where disciplines with a higher proportion of women have lower evaluations and funding success for all researchers, regardless of gender.
- The manuscript is notable for its large sample size and the diverse international scope, enhancing the generalizability of the results.
- The work accounts for various factors including age, number of research outputs, and bibliometric measures, strengthening the validity of the findings.
- The manuscript raises important questions about unconscious bias in research evaluation and funding decisions, as evidenced by lower scores in women-dominated fields even for researchers that are men.
- The study provides a nuanced view of gender bias, showing that it is not limited to individuals but extends to entire disciplines, impacting the perception and funding and quality or worth of research.
- This work underscores the need to explore motivations behind gender distribution across fields, hinting at deep-rooted societal and institutional barriers.
- The authors have opened a discussion on potential solutions to counter bias, like adjusting funding paylines or anonymizing applications, or other practical solutions.
- While pointing out limitations such as the absence of data from major research-producing countries, the manuscript paves the way for future studies to examine whether its findings are universally applicable.
Weaknesses:
- The study does not provide data on the gender of grant reviewers or stakeholders, which could be critical for understanding potential unconscious bias in funding decisions. These data are likely not available; however, this could be discussed. Are grant reviewers in fields dominated by women more likely to be women?
- There could be more exploration into whether the research quality score is influenced by inherent biases towards disciplines themselves, rather than only being gender bias.
- The manuscript should discuss how non-binary gender identities were addressed in the research. There is an opportunity to understand the impact on this group.
- A significant limitation is absence of data from other major research-producing countries like China and the United States, raising questions about the generalizability of the findings. How comparable are the findings observed to these other countries?
- The motivations and barriers that drive gender distribution in various fields could be expanded on. Are fields striving to reach gender parity through hiring or other mechanisms?
- The authors could consider if the size of funding awards correlates with research scores, potentially overlooking a significant factor in the evaluation of research quality. Presumably there is less data on smaller 'pilot' funds and startup funds for disciplines where these are more common. Would funding success follow the same trend for these types of funds?
- The language used in the manuscript at times may perpetuate bias, particularly when discussing "lower quality disciplines," which could influence the reader's perception of certain fields.
- The manuscript does not clarify how many gender identities were represented in the datasets or how gender identity was determined, potentially conflating gender identity with biological sex.
Reviewer #3 (Public Review):
This study seeks to investigate one aspect of disparity in academia: how gender balance in a discipline is valued in terms of evaluated research quality score and funding success. This is important in understanding disparities within academia.
This study uses publicly available data to investigate covariation between gender balance in an academic discipline and:
i) Individual research quality scores of New Zealand academics as evaluated by one of 14 broader subject panels.
ii) Funding success in Australia, Canada, Europe, UK.
The study would benefit from further discussion of it limitations, and from the clarification of some technical points (as described in the recommendations for the authors).
Recommendations For The Authors:
Reviewer #2 (Recommendations For The Authors):
This is a very nice study as-is. In the following comments, I have mainly put my thoughts as I was reading the manuscript. If there are practical ways to answer my questions, I think they could improve the manuscript but the data required for this may not be available.
Are there any data on the gender of grant reviewers or stakeholders who make funding decisions?
The research quality score metrics seem to be more related to unconscious bias. The funding metrics may also, but there are potentially simple fixes (higher paylines for women or remove gender identities from applications).
We have included some details about PBRF funding panel gender diversity. These panels are usually more gender balanced than the field they represent, but in the extreme cases (Engineering, Education, Mathematics) they are skewed as would be expected. Panels for other award decision makers was not available.
I wonder if the research score metric isn't necessarily reflecting on the gender bias in the discipline but rather on the discipline itself? Terms like "hard science" and "soft science" are frequently used and may perpetuate these biases. This is somewhat supported by the data - on line 402-403 the authors state that women in male-dominated fields like Physics have the same expected score as a man. Could it be that Physics has a higher score than Education even if Physics was woman-dominated and Education was man-dominated? Are there any instances in the data where traditionally male- or female-dominated disciplines are outliers and happen to be the opposite? If so, in those cases, do the findings hold up?
Overall we would love to answer this question! But our data is not enough. We mention these points in the Discussion (Lines 472-466). We have extended this a little to cover the questions raised here.
How are those with non-binary gender identities handled in this article? If there is any data on the subject, I would be curious to know how this effects research score and funding success.
These data were either unavailable or the sample size was too small to be considered anonymously (Mentioned on Lines 74-76).
A limitation of the present article is a lack of data on major research-producing countries like China and the United States. Is there any data relevant to these or other countries? Is there reason to believe the findings outlined in this manuscript would apply or not apply to those countries also?
We would be very excited to see if the findings held up in other countries, particularly any that were less European based. Unfortunately we could not find any data to include. Maybe one day!
What are the motivations or other factors driving men to certain fields and women to certain fields over others? What are the active barriers preventing all fields from 50% gender parity?
Field choice is a highly studied area and the explanations are myriad we have included a few references in the discussion section on job choice. I usually recommend my students read the blog post at
https://www.scientificamerican.com/blog/hot-planet/the-people-who-could-have-done-science-didnt/
It is very thoughtful but unfortunately not appropriate to reference here.
The authors find very interesting data on funding rates. Have you considered funding rates and the size of funding awards as a factor in research score? Some disciplines like biomedical science receive larger grants than others like education.
A very interesting thought for our next piece of work. We would definitely like to explore our hypothesis further.
There are instances where the authors writing may perpetuate bias. If possible these should be avoided. One example is on line 458-459 where the authors state "...why these lower quality disciplines are more likely..." This could be re-written to emphasize that some disciplines are "perceived" as lower quality. Certainly those in these discipline would not characterize their chosen discipline as "low quality".
Well-spotted! Now corrected as you suggest.
Similar to the preceding comment, the authors should use care with the term "gender". In the datasets used, how many gender identities were captured? How many gender identity options were given in the surveys or data intake forms? Could individuals in these datasets have been misgendered? Do the data truly represent gender identity or biological sex?
We know that in the PBRF dataset gender was a binary choice and transgender individuals were able to choose which group they identified with. There was no non-binary option (in defence the latest dataset there is from 2018 and NZ has only recently started updating official forms to be more inclusive) and individuals with gender not-stated (a very small number) were excluded. ARC did mention that a small number of individuals were either non-binary or gender not stated, again these are not included here for reasons of anonymity. This is now mentioned on Lines 74-76. The effects on this group are important and understudied likely because, as here, the numbers are too small to be included meaningfully.
Reviewer #3 (Recommendations For The Authors):
Major revisions:
Could you add line numbers to the Supplementary Materials for the next submission?
Yes! Sorry for the omission.
(1) In the main text L146 and Figure 1, it is not clear why the expected model output line is for a 50 year old male from University of Canterbury only, but the data points are from disciplines in all eight universities in New Zealand. I think it would be more clear and informative to report the trend lines that represent the data points. At the moment it is hard to visualise how the results apply to other age groups or universities.
As age and institution are linear variables with no interactions they are only a constant adjustment above or below this line and the adjustment is small in comparison to the linear trend. Unfortunately, if they were included graphically they do not aid understanding. We agree that indluded raw data with an adjusted trend line can be confusing buy after a lor of between-author discussion this was the most informative compromise we could find (many people like raw data so we included it).
(2) Does your logistic regression model consider sample size weighting in pmen? Weighting according to sample sizes needs to be considered in your model. At the moment it is unclear and suggests a proportion between 0 and 1 only is used, with no weighting according to sample size. If using R, you can use glm(cbind(nFem, nMalFem).
Yes. All data points were weighted by group size exactly as you suggest. We have updated the text on Lines 317 to make this clear.
(3) For PBRF, I think it is useful to outline the 14 assessment panels and the disciplines they consider. Did you include the assessment panel as an explanatory variable in your model too to investigate whether quality is assessed in the same manner between panels? If not, then suggest reasons for not doing so.
We have now included more detail in main text on the gender split of the panels. They were not included as an explanatory variable. In theory there was some cross-referencing of panel scores to ensure consistency as part of the PBRF quality assurance guidelines.
(4) There are several limitations which should be discussed more openly:
Patterns only represent the countries studied, not necessarily academia worldwide.
Mentioned on Line 485-487.
Gender is described as a binary variable.
Discussed on Line 74-76.
The measure of research evaluation as a reflection of academic merit.
This is acknowledged in the data limitations paragraph in the discussion, at the end of the discussion
Minor revisions:
(1) L186. Why do you analyse bibliometric differences between individuals from University of Canterbury only? It would be helpful to outline your reasons.
Although bibliometric data is publicly available it is difficult to collect for a large number of individuals. You also need some private data to match bibliometrics with PBRF data which is anonymous. We were only able to do this for our own institution with considerable internal support.
(2) How many data records did you have to exclude in L191 because they could not be linked? This is helpful to know how efficient the process was, should anyone else like to conduct similar studies.
We matched over 80% of available records (384 individuals). We have mentioned this on Line 194.
(3) Check grammar in the sentence beginning in L202.
Thank-you. Corrected.
(4) Please provide a sample size gender breakdown for "University of Canterbury (UC) bibliometric data", as you do for the preceding section. A table format is helpful.
Included on Line 194.
(5) L377 I think this sentence needs revision.
Thank you, we have reworked that paragraph.
(6) L389-392 Is it possible evaluation panels can score women worse than men and that because more women are present in female-biassed disciplines, the research score in these are worse? Women scoring worse between fields, may be a result of some scaling to the mean score.
No. This is not possible because women in male-dominated fields score higher.
(7) L393 Could you discuss explanations for why men outperform women in research evaluation scores more when disciplines are female dominated?
Unfortunately, we don’t have an explanation for this and can’t get one from our data. We hope it will be an interesting for future work.
(8) Could the figures be improved by having the crosses, x and + scaled, for example, in thickness corresponding to sample size? Alternatively, some description of the sample size variation? Sorting the rows by order of pmen in Table E1 would also be helpful for the reader.
As with the previous figure we have tried many ways of presenting it (including tis one). Unfortunately nothing helped.
We have provided Table E1 as a spreadsheet to allow readers to do this themselves.
(9) Please state in your methods section the software used to aid repeatability.
This is now in Supplementary Materials (Matlab 2022b).
(10) It is great to report your model findings into real terms for PBRF and ARC. Please can you extend this to CIHR and EIGE. i.e. describing how a gender skew increase of x associates with a y increase in funding success chance.
We have added similar explanations for both these datasets comparing the advantage of being male with the advantage of working in a male dominated discipline.
(11) I would apply care to using pronouns "his" and "her" in L322-L324 and avoid if at all possible, instead, replacing them with "men" and "women".
We have updated the text to avoid there pronouns in most places.
The article in general would benefit from a disclosure statement early on conceding that gender investigated here is only as a binary variable, discounting its spectrum.
See Line 74-76.
Please also report how gender balance is defined in the datasets as in the data summary in supplementary materials, within the main text.
Our definition of gender balance (proportion of researchers who are men, ) is given on Line 103.
(12) The data summary Table S1 could benefit from explaining the variables in the first column. It is currently unclear how granularity, size of dataset and quotas/pre-allocation? are defined.
These lines have been removed as they information they contained is included elsewhere in the table with far better explanations!
(13) There are only 4 data points for investigating covariation between gender balance and funding success in CIHR. This should be discussed as a limitation.
The small size of the dataset is now mentioned on Line 348.
(14) L455 "Research varies widely across disciplines" in terms of what?
This sentence has been extended
.
(15) L456 Maybe I am missing something but I don't understand the relevance of "Physicists' search for the grand unified theory" to research quality.
Removed.
(16) Can you provide more discussion into the results of your bibliographic analysis and Figure 2? An explanation into the relationships seen in the figure at least would be helpful.
Thank you we have clarified the relationships seen in each of figures 2A (Lines 226-235), 2B (Lines 236-252), and 2C (lines 260-268).
(17) It would be helpful to include in the discussion a few more sentences outlining:
- Potential future research that would help disentangle mechanisms behind the trends you find.
- How this research could be applied. Should there be some effort to standardise?
We have added a short paragraph to the discussion about implications/applications, and future research (Lines 481-484).
(18) The introduction could benefit from discussing and explaining their a priori hypotheses for how research from female-biassed disciplines may be evaluated differently.
While not discussed in the introduction, possible explanations for why and how research in female dominated fields might be evaluated differently are explored in some detail in the Discussion. We think once is enough, and towards the end is more effective than at the beginning.
(19) L16 "Our work builds on others' findings that women's work is valued less, regardless of who performs that work." I find this confusing because in your model, there is a significant interaction effect between gender:pmen. This suggests that for female-biassed disciplines, there is even more of a devaluation for women, which I think your lines in figure 1 suggest.
Correct but men are still affected, so the sentence is correct. What is confusing is that the finding is counter to what we might expect.
-
eLife Assessment
This study provides convincing evidence that the quality of research in female-dominated fields of research is systematically undervalued by the research community. The authors' findings are based on analyses of data from a research assessment exercise in New Zealand and data on funding success rates in Australia, Canada, the European Union and the United Kingdom. This work is an important contribution to the discourse on gender biases in academia, underlining the pervasive influence of gender on whole fields of research, as well as on individual researchers.
-
Reviewer #3 (Public Review):<br /> This study seeks to investigate one aspect of disparity in academia: how gender balance in a discipline is valued in terms of evaluated research quality score and funding success. This is important in understanding disparities within academia.<br /> This study uses publicly available data to investigate covariation between gender balance in an academic discipline and:<br /> individual research quality scores of New Zealand academics as evaluated by one of 14 broader subject panels.<br /> [ii] funding success in Australia, Canada, Europe, UK.
The authors have addressed the concerns I had about the original version
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This study presents a valuable development of endometrial organoid culture methodology that mimics the window of implantation. Functional validation to demonstrate its robustness is lacking; therefore, the study is considered incomplete. The data may be interesting to embryologists and investigators working on reproductive biology and medicine.
-
Reviewer #1 (Public Review):
This study generated 3D cell constructs from endometrial cell mixtures that were seeded in the Matrigel scaffold. The cell assemblies were treated with hormones to induce a "window of implantation" (WOI) state.
The authors did their best to revise their study according to the reviewers' comments. However, the study remains unconvincing and at the same time too dense and not focused enough.
(1) The use of the term organoids is still confusing and should be avoided. Organoids are epithelial tissue-resembling structures. Hence, the multiple-cell aggregates developed here are rather "co-culture models" (or "assembloids"). It is still unexpected (unlikely) that these structures containing epithelial, stromal and immune cells can be robustly passaged in the epithelial growth conditions used. All other research groups developing real organoids from endometrium have shown that only the epithelial compartment remains in culture at passaging (while the stromal compartment is lost). If authors keep to their idea, they should perform scRNA-seq on both early and late (passage 6-10) "organoids". And they should provide details of culturing/passaging/plating etc that are different with other groups and might explain why they keep stromal and immune cells in their culture for such a long time. In other words, they should then in detail compare their method to the standard method of all other researchers in the field, and show the differences in survival and growth of the stromal and immune cells.<br /> (2) The paper is still much too dense, touching upon all kind of conclusions from the manifold bioinformatic analyses. The latter should be much clearer and better described, and then some interesting findings (pathways/genes) should be highlighted without mentioning every single aspect that is observed. The paper needs a lot of editing to better focus and extract take-home messages, not bombing the reader with a mass of pathways, genes etc which makes the manuscript just not readable or 'digest-able'. There is no explanation whatever and no clear rationale why certain genes are included in a list while others are not. There is the impression that mass bioinformatics is applied without enough focus.<br /> (3) The study is much too descriptive and does not show functional validation or exploration (except glycogen production). Some interesting findings extracted from the bioinformatics must be functionally tested.<br /> (4) In contrast to what was found in vivo (Wang et al. 2020), no abrupt change in gene expression pattern is mentioned here from the (early-)secretory to the WoI phase. Should be discussed. Although the bioinformatic analyses point into this direction, there are major concerns which must be solved before the study can provide the needed reliability and credibility for revision.<br /> (5) All data should be benchmarked to the Wang et al 2020 and Garcia-Alonso et al. 2021 papers reporting very detailed scRNA-seq data, and not only the Stephen R. Quake 2020 paper.<br /> (6) Fig. 2B: Vimentin staining is not at all clear. F-actin could be used to show the typical morphology of the stromal cells?<br /> (7) Where does the term "EMT-derived stromal cells" come from? On what basis has this term been coined?<br /> (8) CD44 is shown in Fig. 2D but the text mentions CD45 (line 159)?<br /> (9) All quantification experiments (of stainings etc) should be in detail described how this was done. It looks very difficult (almost not feasible) when looking at the provided pictures to count the stained cells.<br /> (10) Fig. 3C: it is unclear how quantification can be reliably done. Moreover, OLFM4 looks positive in all cells of Ctrl, but authors still see an increase?<br /> (11) Fig. 3F: Met is downregulated which is not in accordance with the mentioned activation of the PI3K-AKT pathway.<br /> (12) Lines 222-226: transcriptome and proteome differences are not significant; so, how meaningful are the results then? Then, it is very hard to conclude an evolution from secretory phase to WoI.<br /> (13) WoI organoids show an increased number of cilia. However, some literature shows the opposite, i.e. less ciliated cells in the endometrial lining at WoI (to keep the embryo in place). How to reconcile?<br /> (14) How are pinopodes distinguished from microvilli? Moreover, Fig. 3 does not show the typical EM structure of cilia.<br /> (15) There is a recently published paper demonstrating another model for implantation. This paper should be referenced as well (Shibata et al. Science Advances, 2024).<br /> (16) Line 78: two groups were the first here (Turco and Borreto) and should both be mentioned.<br /> (17) Line 554: "as an alternative platform" - alternative to what? Authors answer reviewers' comments by just changing one word, but this makes the text odd.
-
Reviewer #2 (Public Review):
In this research, Zhang et al. have pioneered the creation of an advanced organoid culture designed to emulate the intricate characteristics of endometrial tissue during the crucial Window of Implantation (WOI) phase. Their method involves the incorporation of three distinct hormones into the organoid culture, coupled with additives that replicate the dynamics of the menstrual cycle. Through a series of assays, they underscore the striking parallels between the endometrial tissue present during the WOI and their crafted organoids. Through a comparative analysis involving historical endometrial tissue data and control organoids, they establish a system that exhibits a capacity to simulate the intricate nuances of the WOI.
The authors made a commendable effort to address the majority of the statements. Developing an endometrial organoid culture methodology that mimics the window of implantation is a game-changer for studying the implantation process. However, the authors should strive to enhance the results to demonstrate how different WOI organoids are from SEC organoids, ensuring whether they are worth using in implantation studies, or a proper demonstration using implantation experiments.
-
Author response:
The following is the authors’ response to the original reviews.
Reviewer #1 (Public Review):
Q1: First of all, the term organoid must be discarded. The authors just seed the endometrial cell mixture which assembles and aggregates into a 3D structure which is then immediately used for analysis. Organoids grow from tissue stem cells and must be passage-able (see their own description in lines 69-71). So, the term organoid must be removed everywhere, to not confuse the organoid field. It is not shown that the whole 3D assembly is passageable, which would be very surprising given the fact that immune and stromal cells do not grow in Matrigel because of the unfavorable growing conditions (which are targeted to epithelial cell growth).
We appreciate for your highlighting concerns regarding our organoid construction.
(1) The organoids in our system were originated from tissue stem cells.
We induced adult stem cells derived from endometrial tissue to construct organoids in vitro by various small molecules (such as Noggin, EGF, FGF2, WNT-3A and R-Spondin1), which involves a complex self-assembly process rather than a mere cellular assembly. Initially, there are single cells and small cell clusters in the system two days after the planting. On the fourth day, the glandular epithelial cells gradually assembled to glands, while the stromal cells spontaneously organized themselves around the glands. On the eleventh day, the endometrial glands enlarged, epithelial cells organized in a paving stone arrangement, and stromal cells established an extensive network. (Author response image1) (Figure 1C)
(2) The organoids we constructed are passage-able.
Most organoids were used for experiments up to the fifth generation, while some are extended to the 10th generation and cryopreserved. (Response Figure 1B, C)
(3) Immune and stromal cells are present in our system from the primary to the fourth generation. In our study, immune and stromal cells were identified not only from scRNA-seq data (third generation of organoids) (Figure 2A), but also from the morphology using 3D transparent staining and light sheet microscopy imaging (third generation of organoids), with Vimentin marking stromal cells, CD45 designating immune cells, and FOXA2 identifying glands. Further, flow cytometric analysis was applied to verify immune cells within the organoids (third generation of organoids). (Response Figure 1D, E, F)
Moreover, Immune cells and stromal cells can grow in Matrigel, which was also found in the study of organoid pioneer Hans Clevers (Hans Clevers et al., Nature Reviews Immunology 2019).
Author response image 1.
(A) The growth condition of endometrial cells was observed from day2 to day11 after plating under an inverted microscope. Scale bar = 200 μm. (B) The endometrial organoids of different passages were observed from P1 to P5. Scale bar = 200 μm. (C) Stromal cells formed an extensive network (down). The arrowhead indicates dendritic stromal cells. Scale bar = 100 μm (left), Scale bar = 50 μm (right). (D) Exhibition of stromal cells marked by vimentin. Nuclei were counterstained with DAPI. The arrow indicates stromal cells. Scale bar = 40 μm (up), Scale bar = 30 μm (down). (E)Exhibition of immune cells marked by CD45 and endometrial gland marked by FOXA2. Nuclei were counterstained with DAPI. The arrow indicates immune cells. Scale bar = 50 μm. (F) Flow cytometric analysis of T cells and macrophages in the endometrial organoid. Gating strategy used for determining white blood cells (CD45+ cells), T cells (CD45+CD3+ cells) and macrophages (CD45+CD68+CD11b+ cells).
Q2: Second, the study remains fully descriptive, bombing the reader with a mass of bioinformatic analyses without clear descriptions and take-home messages. The paper is very dense, meaning readers may give up. Moreover, functional validation, except for morphological and immunostaining analyses (which are posed as "functional" but actually are only again expression) is missing, such as in vivo functionality (after transplantation e.g.) and embryo interaction. Importantly, the 3D structure misses the right architecture with a lining luminal epithelium which is present in the receptive endometrium in vivo and needed as the first contact site with the embryo. So, in contrast to what the authors claim, this is not the best model to study embryo interaction, or the closest model to the in vivo state (line 318, line 326).
Thank you.
(1) We have made the following improvements. Firstly, we have conducted additional experiments to validate the bioinformatics analysis. Secondly, the structure of the manuscript has been refined to ensure logical coherence and clear transitions between paragraphs. Thirdly, important findings have been emphasized to ensure readers’ comprehension and inspiration. Furthermore, the manuscript was revised by both domestic and international experts to enhance the readability and clarity.
(2) For the functional validation, in vivo transfer could not be carried out so far due to ethical limitation. But human embryos are able to develop and grow more efficiently in combining with the receptive endometrial organoids we generated (unpublished data).
(3) As you suggested, we replaced the “closest” with “closer”. It is undeniable that the model cannot completely simulate the in vivo implantation process that the luminal epithelium of the endometrium contacts the embryo first.
Q3: Third, receptive endometrial organoids (assembloids; Rawlings et al., eLife 2021) and receptive organoid-derived "open-faced endometrial layer" (Kagawa et al., Nature 2022) have already been described, which is in contrast to what the authors claim in several places that "they are the first" (e.g. lines 87-88, 316-319, etc). These studies used real organoids to achieve their model (and even showed embryo interaction), while in the present study, different cell types are just seeded and assembled. Hence, logically, immune cells are present which are never found in real organoid models. The only original aspect in the present study is the use of hormones to enhance the WOI phenotype. However, crucial information on this original aspect is missing such as concentration of the hormones, refreshment schedule, all 3 hormones added together or separately, and all 3 required?
Thank you for pointing out these researches referring to endometrial organoids.
(1) While we didn’t explicitly state "the first", we should be careful to use the expressions similar to "the first". It has been changed to a gentle and modest expression, as follows “we are far from understanding how embryo implantation occurs during the WOI due to ethical limitations and fewer in vitro receptive endometrial model” and “which confirms that they are closer to the in vivo state”.
(2) The definition of organoids and the existence of immune cells have been detailed addressed in the first question.
(3) In terms of hormone scheme, hormone concentrations have been detailed in Table S2 of Supplementary. Estrogen was supplemented to the basal medium for the initial two days, after which a combination treatment of MPA, cAMP, PRL, hPL, and HCG was administered for the subsequent six days. The medium was refreshed every two days.
All three hormones were deemed necessary, which was validated by multiple group comparisons. Only the organoids treated with all six hormones together exhibited an endometrial receptivityrelated gene expression profile. (Author response image 2).
Author response image 2.
Heatmap showing receptivity related gene expression profile of organoids in each hormone regimen.
Q4: Moreover, it is not a "robust" model at all as the authors claim, given the variability of the initial cell mixture (varying from patient to patient). Actually, the reproducibility is not shown. The proportions of the different cell types seeded in the Matrigel droplet will be different with every endometrial biopsy. It would be much better to recombine epithelial (passageable) organoids with stromal and immune cells in a quantified, standardized manner to establish a "robust" model.
Thanks for your suggestion.
Firstly, the constructed endometrial organoids generally consist of epithelial, stromal, and immune cells. However, it is undeniable that the cell proportions may vary slightly among different patients. Secondly, the term "robust" is intended to convey strong support for embryo development, which will be supported by our next study (unpublished data). Therefore, robust is replaced here as alternative. Thirdly, as for "reproducibility", the hormone-treated organoids from different women exhibited similarity to the in vivo receptive endometrium through multi-omics analysis, ERT, and various other experiments.
Reviewer #2 (Public Review):
Q1: With endometrial receptivity analysis, they suggest a successful formation of the implantation window in vitro, but this result is difficult to interpret.
Thanks for your question.
We understand that the most effective way to demonstrate endometrial receptivity is embryo implantation, which was conducted simultaneously and will be presented in our next study. In this study, we validated the receptivity based on the current researches.
(1) At the single-cell transcriptome level, the cellular composition and function of the receptive endometrial organoids were similar to those of the in vivo implantation window (Stephen R. Quake et al, 2020).
(2) At the whole organoids level, the receptive endometrial organoids exhibited the similar characteristics in transcriptome and proteome to the in vivo mid-secretory endometrium (Andres Salumets 2017, Qi Yu 2018, Triin Laisk 2018, Edson Guimarães Lo Turco 2018, Xiaoyan Chen 2020, Francisco Domínguez 2020, DavidW. Greening 2021, Norihiro Sugino 2023). The receptive endometrial organoids were also validated by endometrial receptivity test (ERT), which utilized high-throughput sequencing and machine learning to assess endometrial receptivity (Yanping Li et al., 2021).
(3) At the microstructural level under electron microscope, the receptive endometrial organoids exhibited characteristics of the implantation window, such as pinopodes, glycogen particles, microvilli, and cilia.
Overall, the receptive organoids we constructed closely resemble the in vivo implantation window at the single-cell, organoids, and microstructural levels based on existing researches.
Q2: Analyzing transcriptome and proteome information of WOI organoids, authors demonstrate a strong response to estrogen and progesterone, but some comparisons are made with CTRL and SEC, and others only with CTRL, which limits the power of some results. In the same way, some genes related to Cilia and pinopodes appear dominant in WOI organoids, but the comparison by electron microscopy is made only against CTRL organoids.
In subsequent analysis, WOI organoids showed a marked differentiation from proliferative to secretory epithelium, and from proliferative epithelium to EMT-derived stromal cells than SEC organoids. These statements are based on their upregulation of monocarboxylic acid and lipid metabolism, their enhanced peptide metabolism and mitochondrial energy metabolism, or their pseudotime trajectories. However, other analyses (such as the accumulation of secretory epithelium or decreased proliferative epithelium, the increased ciliated epithelium after hormonal treatment, or the presence of EMT-derived stromal cells) show only small differences between SEC and WOI organoids.
Thank you for raising these important questions.
(1) At the organoid level, the differences in transcriptome and proteome between SEC and WOI organoids are not significant. This is understandable because WOI organoids are further induced towards the implantation window based on the secretory phase (i.e. SEC organoids), and both are similar at the overall organoid level.
(2) At the single-cell level, the accumulation of secretory epithelium, decreased proliferative epithelium, increased ciliated epithelium post hormonal treatment, or the presence of EMTderived stromal cells are the fundamental features of the secretory endometrium. Therefore, these features are present in both WOI and SEC organoids. However, the most notable differences lie in the more comprehensive differentiation and varied cellular functions exhibited by WOI organoids compared to SEC organoids.
(3) Regarding electron microscopy, we have now quantitatively compared the presence of various characteristic structures such as microvilli, cilia, pinopodes and glycogen in the CTRL, SEC and WOI groups. It has been observed that WOI organoids possess longer microvilli and increased cilia, glycogen, and pinopodes compared to SEC organoids (Fig2H).
Reviewer #1 (Recommendations For The Authors):
Q1: Several of the key methods are performed by companies, hence not in detail described and therefore not verifiable which is essential for reviewers and readers.
We are grateful for the suggestion. Specific methods have now been incorporated into the "Supporting Information" section. (Line91~102, Line 107~123, Line 132~139)
Q2 - Line 49: It is not shown in the present study whether the WOI organoids are a 'robust' platform.
- Line 76: There is a study (Dolat L., Valdivia RH., Journal of Cell Science, 2021) that developed a co-culture with endometrial organoids and immune cells (neutrophils) which should be mentioned.:
We have reweighed the word and now replace 'robust' with 'alternative' (Line 54). We have considered the reviewer's suggestion and added this citation (Line 82-83) about the cocultivation of immune cells with endothelial organoids, which was not previously cited mainly because the research model was mouse.
Q3: Figure 1: Endometrial organoids possess endometrial morphology and function. - The authors should further explain their decision to add PRL, hCG, and hPL to the organoid culture. Why these particular compounds? What is their specific role during the WOI?
In terms of hormone scheme, estrogen and progesterone promote the transition of endometrial organoids into the secretory phase, and on this basis, pregnancy hormones can further promote their differentiation. PRL promotes immune regulation and angiogenesis during implantation, HCG improves endometrial thickness and receptivity, and HPL promotes the development and function of endometrial glands. Our constructed WOI organoid is in a state conducive to embryo implantation. We aim to develop an in vitro model for embryo implantation study. The detailed explanation of this aspect was initially provided in the Discussion section (Lines 298–313). To enhance the clarity for reviewers and readers regarding the selection of the hormonal regimen, we have now articulated it in the Results section (Lines 124–130).
When selecting hormone formulations, multiple group comparisons were made. It was found that the number, area, and average intensity of organoids in these groups were similar over time. But the WOI organoids showed endometrial receptivity related gene expression profile, which highly expressed genes positively correlated with endometrial receptivity, and lowly expressed genes negatively correlated with receptivity, compared to the other hormone formulations (added to Figure S1E, S1F). Hormone dosage was primarily based on peri-pregnant maternal body or localized endometrium levels (Margherita Y. Turco et al., Nature Cell Biology 2017).
- Line 108: "the endometrial cells" instead of "endometrial organoid"? Because the authors also refer to the stromal cells.
You should be referring to this sentence “The endometrial organoid, consisting of vesicle-like glands, fibrous stromal cells, and other surrounding cells, developed into a 3D structure with the support of Matrigel”. Organoid, a self-assembled 3D structure, consists of multiple cells and closely resembles in vivo tissue or organ. It offers high expansibility, phenotypic, and functional properties. Here, we aim to delineate the endometrial organoid, comprising epithelial cells, stromal cells, and other cellular components that assemble to form intricate 3D structures. Hence, the term "endometrial organoid" is more appropriate.
- Line 110: "the endometrial glands", do the authors mean the endometrial organoids? The authors also mention they enlarge, which must be quantified.
You should be referring to this sentence “As the organoids grew and differentiated, the endometrial glands enlarged, epithelial cells adopted a paving stone arrangement, and stromal cells formed an extensive network”. Here, we mean the “endometrial glands” grow progressively in the organoids. We agree with your suggestion to quantify the change of organoids’ area over time, and found that they increased progressively in all three groups (shown as follows) (Fig.S1E) (Line130-131)
Author response image 3.
The dynamic changes of the area of organoids over time in the CTRL, SEC and WOI organoids.
- Line 112: E-cadherin is a general epithelial marker, not a glandular marker.
We agree with your suggestion and now change to ‘The epithelium marker E-cadherin’ (Line110).
- Line 116: Which group was used for KI67 and CC3 staining?
The CTRL organoids were used for Ki67 and CC3 staining. We have modified this expression in the Figure 1E Legend.
- Line 123: Organoid size (diameter or area) needs to be quantified to claim that WOI organoids grow slower than SEC/CTRL organoids. The same goes for Ki67+ cells for proliferation. In the legend of Fig 1B, the authors in contrast state that the organoids show a similar growth pattern.
We are extremely grateful to you for pointing out this problem. We quantitatively analyzed the size of organoids in the three groups. The area was found to be increasing over time, with the three groups growing the most vigorously in the CTRL group, followed by the SEC group and the WOI group, but the differences were not statistically significant. Relevant results have been added to Figure S1E (Line130-131). There were no significant differences in Ki67 expression of these organoids. Therefore, the three groups of organoids showed a similar growth pattern. We decided to delete the statement “Following hormonal stimulation, WOI organoids exhibited slower growth than SEC and CTRL organoids, while CTRL organoids maintained robust proliferative activity (Fig. 1B)”.
Author response image 4.
The dynamic changes of the area of organoids over time in the CTRL, SEC and WOI organoids.
- Line 126: Fourteen days of organoid treatment is a very long time. Growing organoids may already be dying which should be checked by CC3 staining to prove that organoids are still fully viable.
Endometrial organoids are vigorous in proliferation and have a long survival period due to the presence of adult stem cells. To address your queries effectively, we conducted CC3 staining on the organoids treated for 14 days, revealing negligible expression levels (shown as below).
Author response image 5.
Figure note: The Ki67 and CC3 immunostaining on the organoids after 14-day hormone treatment.
- Line 128: Changes in hormone receptors should be supported by RT-qPCR data to be more convincing
We agree with your suggestion. Here we supplemented the RT-PCR results of hormone receptors as follows (Figure S1D) (Line119-121). PAEP and PGR are associated with progesterone, and OLFM4 and EGR1 are associated with estrogen.
- 1A: Are authors able to see and characterize decidualized stromal cells as indicated in the illustration?
Upon the reviewer's inquiry, we carefully observed the morphology of stromal cells in hormone-treated organoids. Regrettably, the morphology of decidualized stromal cells was not ascertainable through light microscopy in our endometrial organoids.
- 1C: Which treatment condition are the organoids in these images?
This figure showed the bright-field morphology of the CTRL organoids, which is now noted in the Figure 1C legend.
- 1D: PAS staining should be quantified to support the claims.
We agree with your suggestion. The quantitative comparison of PAS staining was conducted in these three groups of organoids (Figure S1G) (Line142-143)
- 1D: Where are the stromal cells in the model? There should be vimentin-positive cells outside of the glands.
The figure 1D illustrates the outcomes of section staining, which owned limitation to displaying stromal cells around the gland. Considering the 3D structure of organoids, we conducted organoid clearing and staining, and observed stromal cells (marked by Vimentin) under light sheet microscope (shown as below). The stromal cells were also presented using this method in the original Figure 2B.
Author response image 6.
Exhibition of stromal cell marked by vimentin of CTRL organoid through whole-mount clearing, immunostaining and light sheet microscopy imaging. Nuclei were counterstained with DAPI. The arrowhead indicates stromal cells. Scale bar = 70 μm.
Figure 2: Developing receptive endometrial organoids in vitro mimicking the implantation window endometrium.
- Line 142: CD44 is not an exclusive marker for immune cells. It has been shown to be expressed in glandular secretory epithelial cells (Fonseca et al., 2023). The authors also mention that CD44 is expressed in stromal cells (line 265). Staining for CD45 (or another immune-specific marker) is needed to demonstrate the presence of immune cells.
We appreciated your suggestions. We demonstrated the distribution of immune cells in organoids using the organoid clearing technique in combination with light-sheet microscopy imaging, using CD45 as a marker (Figure 2C).
- Line 144: What are the proportions of the immune cells? What is the variation between patient samples?
We assessed the proportion of immune cells with the help of flow cytometry and analyzed the proportion of Macrophages and T cells in organoids derived from 8 patients. The proportion of WBC in organoids was about 3%~4% (Figure 2D), among which macrophages were less than 1% and T cells less than 2% (Figure S2E). There existed a very few patients with large heterogeneity, and the proportion of immune cells in most patients was
relatively stable.
- Line 161: What is the endometrial receptivity test (ERT)? Not explained at all.
Endometrial Receptivity Test (ERT) is a kind of gene analysis-based method for detecting endometrial receptivity, which combines high-throughput sequencing and machine learning to analyze the expression of endometrial receptivity-related genes, allowing for a relatively accurate assessment of endometrial receptivity. It is currently used in clinical practice to determine endometrial receptivity and guide personalized embryo transfer (Yanping Li et al., J Transl Med 2021). (line179-183)
- 2A: The authors' dataset is compared to a published dataset. How were they combined? Were they merged, mapped on each other, or integrated? Were all cells employed from the published dataset or specific cell types? Much detail to evaluate the analysis is missing.
We are very grateful for your comments.
(1) The four raw datasets (CTRL, SEC and WOI organoids, and mid-secretory endometrium) underwent batch correction and integration using Harmony. Subsequently, the integrated dataset underwent dimensionality reduction via PCA. The soft k-means clustering algorithm was employed to address batch effects and clustering, utilizing a clustering parameter resolution of 0.5. Finally, the clustering results were visualized using tSNE based on the cell subpopulation classification. (“Methods” Line164-175)
(2) The Figure 2A displayed comparison of glandular and luminal epithelium, secretory epithelium, LGR5 epithelium, EMT-derived stromal cells, ciliated epithelium, and glandular secretory epithelium (shown as Figure S2C~S2D) (Line150-154)
- 2E: Please add the cell type names above the heatmaps to improve readability.
Thanks to your suggestion, we have added the cell type names above the heatmaps.
- 2G: The difference between the left and right graphs is not clear from the figure itself. Improve by adding a title and more explanation.
Thanks for your careful review. We have added the title to the left and right graphs.
Supplementary Figure 3 is referenced with Figure 2. Supplementary Figure 2 is referenced with Figure 3. The order needs to be changed.
Thanks for your careful review. We have changed the order.
- S3B: Typical markers for annotation of the different cell clusters are not included and therefore it is not convincing enough that annotations are correct. E.g. Epithelial markers (EPCAM, CDH1), Stromal cells (VIM, PDGFRA), SOX9+LGR5+ cells (SOX9, LGR5). How were the EMT-derived stromal cells designated? It is not clear from the data whether they are in fact EMT-derived or whether they show epithelial markers as well (stated in line 246).
We deeply appreciate your suggestion. We provided more details to describe the cell clustering as the following. Single-cell transcriptomics analysis referred to CellMarker, PanglaoDB, Human Cell Atlas, Human Cell Landscape, and scRNASeqDB, and previous endometrium related studies. (W. Wang et al., Nat Med 2020, P. D. Harriet C. Fitzgerald et al., PNAS 2019, K. M. Thomas, M Rawlings et al., eLife 2021, L. Garcia-Alonso et al., Nat Genet 2021)
(1) SOX9+LGR5+ cells: SOX9 and LGR5 are both proliferative markers. SOX9 is expressed in all clusters dispersedly. LGR5 is mainly expressed in two clusters, one of which is stem derived epithelium, and the other cluster expresses LGR5 in a scattered manner. Refer to the markers of SOX9+LGR5+ cells, SOX9+LGR5- cells, and SOX9+ proliferative cells in 2021 Nature Genetics (L. Garcia-Alonso et al., Nat Genet 2021), the cells in this cluster expressed high levels of NUAK2, CNKSR3, FOS and LIF, which was consistent with the expression profiles of SOX9+LGR5+ cells and SOX9+ proliferative cells. However, considering that the number of cells expressing LGR5 was relatively small, this cluster of cells was renamed SOX9+ proliferative epithelium.
Figure 3: Receptive endometrial organoids recapitulate WOI-associated biological characteristics. - Line 173-174: The WOI organoids should be compared in detail to the SEC organoids in addition to the CTRL organoids, to show that this WOI model and new hormonal treatment is providing better results compared to the SEC organoids and the results obtained in previous studies.
Thanks for your suggestion. At the organoid level, the differences in transcriptome and proteome between SEC and WOI organoids are not significant. This is understandable because WOI organoids are further induced towards the implantation window based on the secretory phase (i.e. SEC organoids), which prompted us to continue exploring at the single-cell level.
- Line 190: Quantification of pinopodes is required to claim that they are more densely arranged in WOI organoids.
- Line 190-191: Again, is there a difference in pinopode presence between the WOI and SEC organoids to show that the WOI organoids are really distinct and a better model?
We agree with the reviewer’s suggestion and quantified the pinopodes. The CTRL, SEC and WOI organoids were found to have increasing numbers of pinopodes, with WOI organoid owning the most abundant pinopodes under electron microscope. (Figure 2H) (Line184-186)
- Line 194: Also here, quantification of the glycogen particles is missing.
We agree with your suggestion. We have quantified the area of glycogen particles under electron microscope in the CTRL, SEC and WOI organoids. It was found that WOI organoid had the most glycogen particles. (Figure 2H) (Line184-186)
- 3C: There is no difference between SEC and WOI organoids condition for OLFM4 and PRA/B. What is the purpose then of adding extra hormones if no difference is present?
The figure 3C indicated that there was no significant difference in OLFM4 and PRA/B level (reflecting estrogen and progesterone responsiveness) in SEC and WOI organoids at the organoids level. It is understandable because WOI organoids are induced further into the implantation window on the basis of the secretory phase (i.e., SEC organoids), and both are similar at the overall level of organoids. Based on this, we further explored the differences between WOI organoids and SEC organoids at the single-cell level.
- 3G: A higher magnification is necessary to evaluate cilia staining. From these images, it seems like CTRL organoids also express acetyl-a-tubulin.
Thanks for your suggestion. The figure has been enlarged and shown as below. The acetyl-a-tubulin of WOI organoids is different from that of CTRL organoids in morphology and expression level. The glands of WOI organoids have small green tips (expressing acetyl-α-tubulin) convex toward the lumen. WOI organoids expressed higher level of acetyl-α-tubulin than CTRL organoids. (Now replaced with Figure 3G in the revised draft).
Figure 4: Structural cells construct WOI with functionally dynamic changes
- Line 211: To which figure are these claims referring to?
You should be referring to this sentence “In terms of energy metabolism, the WOI organoids exhibited upregulation of monocarboxylic acid and lipid metabolism, and hypoxia response”. Up-regulation of monocarboxylic acid and lipid metabolism in WOI organoids is reflected in Figure 3B, and up-regulation of hypoxia responses is reflected in Figure S3F.
- In general, it should be stated in the text that CellPhoneDB is a useful tool to investigate ligandreceptor interactions, however, it only proposes potential interactions. To validate such interactions, stainings and functional assays are required.
Thanks for your suggestion. The CellphoneDB was briefly introduced in the "Methods" section of "Supporting information" originally. Now it has been explained in the line 256-257 of main text.
We agree that staining and functional assays are required to validate the ligand-receptor interactions. Therefore, we used the proximity ligation assay (PLA) to verify the trend of interaction. (Figure S2J, Line259-261, Line 277-279, Line 285-288)
- Line 243: Please describe the process of EMT in the endometrium more specifically.
EMT is a common and crucial biological event in the endometrium during the implantation window. During the EMT process, epithelial cells lose their epithelial characteristics while gaining migratory and invasive properties of fibroblasts.
During the attachment and adhesion phases of embryo implantation, interaction mediated by trophoblastic factors (e.g. integrins) and maternal ECM factors (e.g. fibronectin) induce the eventual EMT in the trophectoderm. During the peri-implantation period, microRNAs, (e.g. miR429 and miR-126a-3p) which regulate EMT, are expressed in the maternal luminal epithelium to different degrees, mediating its transformation process as the blastocyst invades the maternal decidua. The epithelium of endometrium transforms to epithelioid stromal cells with increased migratory and invasive capacities through the EMT process. The decidual stromal cells migrate away from the implantation site, having acquired increased motility. (Line 265-267)
- Lines 247-251 and 313-316: the claim that proliferative epithelium transforms into EMT derived stromal cells by pseudotime trajectory is too bold and must be underpinned by other means. Pseudotime analysis only suggests and is by definition biased since the first/originating population must be defined by the operator.
In addition to pseudotime analysis based on monocle, RNA rate analysis based on scVelo is also used for cell evolution analysis. They can prove each other if both analyses indicate the transformation from proliferative epithelium to EMT-derived stromal cell. RNA rate analysis automatically determines the direction of differentiation, which can be used as evidence to determine the starting point of pseudotime analysis.
RNA rate analysis showed that the EMT derived stromal cell was most closely connected to the proliferative epithelium. Besides, the pseudotime point plot inferred that the proliferative epithelium was the root cell. It can be mutually proved with pseudotime analysis that the transformation from proliferative epithelium to EMT-derived stromal cell.
Author response image 7.
RNA rate junction diagram (To infer intercellular connectivity)
Author response image 8.
Time differentiation of cells
Discussion
- Line 300-302: It would be interesting to investigate ATP production and IL8 release in the WOI organoids to validate with findings from in vivo.
To answer this point of your interest, we purposely examined ATP production and IL8 release. It was found that WOI organoids indeed produced much more ATP and IL8 than CTRL and
SEC organoids (Figure S3L) (Line323-324)
- Line 313-316: Do the WOI organoids lose polarity and cell-to-cell junctions?
Transcriptome sequencing revealed downregulation of cell adhesion and RHO GTPase signaling in WOI organoids (Figure 3B). Electron microscopy revealed that the cellular arrangement of WOI organoids was slightly looser than that of CTRL organoids, but the microvilli were still oriented toward the medial side of the glands and did not undergo polarity reversal (shown as below).
Author response image 9.
Electron micrograph of the CTRL (left), and WOI (right) endometrial organoid. Scale bar = 5 μm.
- Line 322: Where is the data that shows that 'a decreased abundance of immune cells', is observed?
A decreased abundance of immune cells was observed through single-cell transcriptome sequencing and flow cytometry. The number of immune cells was reduced in WOI organoids compared to CTRL organoids in single-cell sequencing results (Figure 4A). Besides, flow cytometry also showed that the percentage of WBCs in WOI organoids was lower than that in CTRL organoids (Figure S2F).
- Line 324: Elaborate more on how the immune cell composition differs from the endometrium.
The differences of immune cell composition between organoids and endometrium were mainly reflected in the proportion of WBC, the proportion of immune cell subtypes and the changes of T cells after entering the implantation window.
Firstly, the proportion of WBCs in organoids was lower than that in endometrium. Flow cytometry showed that the proportion of WBC in organoids was about 3%~4% (Figure 2D), but the proportion of WBCs in endometrium was about 8% (W. Wang et al., Nat Med 2020). Secondly, the proportions of T cells and macrophages in organoids were about 2%~3% and 1% (Figure 2D), respectively, but the proportions of lymphocytes and macrophages in endometrium were 7%~8% and 0.6%~0.7% (W. Wang et al., Nat Med 2020). Besides, after entering the implantation window, T cells in WOI organoids decreased (Figure S2F), while T cells in endometrium increased (W. Wang et al., Nat Med 2020). These three aspects have differences in vivo and in vitro. (Line347353)
Material and Methods
- What are the concentrations of all medium components?
Thanks to your suggestions. The concentrations of all medium components have now been refined in Table S1.
- Authors mention 10x while Smartseq2 is mentioned in Dataset S7?
Thanks for your careful review. Single cell transcriptome sequencing in this study was done using 10X Genomics. Smartseq2 was used to sequence the transcriptome of a gland and its surrounding cells, which can be regarded as small bulk RNA sequencing. A small number of cells are utilized in Smartseq2 to construct a full-length mRNA library with enhanced transcript sequencing coverage, making it particularly well-suited for small-scale samples such as organoids.
The data in Dataset S7 are acquired from small bulk RNA-seq with Smartseq2.
Reviewer #2 (Recommendations For The Authors):
Q1: The theoretical choice of extra reagents added to the WOI organoids culture (PRL, hCG, and hPL) is theoretically justified, but not experimentally. On what previous studies, or performed experiments, are the choice of conditions used based?
When selecting hormone formulations, multiple group comparisons were made. It was found that the number, area, and average intensity of organoids in these groups were similar over time. But the WOI organoids showed endometrial receptivity related gene expression profile, which highly expressed genes positively correlated with endometrial receptivity, and lowly expressed genes negatively correlated with receptivity, compared to the other hormone formulations (added to Figure S1E, S1F). Hormone dosage was primarily based on peri-pregnant maternal body or localized endometrium levels (Margherita Y. Turco et al., Nature Cell Biology 2017).
Q2: Text in line 111 indicates that "stromal cells formed an extensive network", but vimentin fluorescence is not present on any image surrounding organoids in that figure. This assertion could only be supported by the subsequent results in Figure 2B. In addition, it is not indicated what kind of organoids have been used for these experiments
The stromal cells arranged around the glands in the 3D structure (as shown in Figure 1C and Figure 2B), where bright-field high magnification photography, clearing staining of the organoids, and light microscopy imaging were used, respectively. However, there are many steps of fixation, embedding, staining and elution during the immunostaining of sections. It is difficult to preserve the arrangement and morphology of the stromal cells in the slice, so the stromal cells were not intentionally captured in the other images.
Figure 1C and Figure 2B are both CTRL organoids, which are now noted in the corresponding figure legend section.
Q3: It is not clear how glycogen secretion into the lumen is assessed in Figure 1D.
Glycogen from the subnuclear region of the glandular cells gradually reaches the top of the cells, i.e., the supranuclear region, and is discharged into the glandular lumen as parietal plasma secretion. Glycogen-containing eosinophilic secretion can be seen in the glandular lumen in Figure1D.
Q4: Assertions about differences in proliferation between groups are purely subjective; some kind of measurement and analysis would be necessary to be sure that there is differential proliferation based on Figure 1B.
We are extremely grateful to you for pointing out this problem. We quantitatively analyzed the size of organoids in the three groups. The area was found to be increasing over time, with the three groups growing the most vigorously in the CTRL group, followed by the SEC group and the WOI group, but the differences were not statistically significant. Relevant results have been added to Figure S1E (Line130-131).
Q5: For progesterone receptor expression analysis organoids are cultured for fourteen days. What is the basis for this change in culture time?
The choice of time point here is based on the secretary period of 14 days in the female menstrual cycle, when the endometrium is stimulated by estrogen and progesterone to maximized
level.
Q6: "n" number of individuals analysed through single-cell transcriptomics is not indicated.
One patient's endometrium was simultaneously constructed into CTRL, SEC and WOI organoids, which were then subjected to single-cell transcriptome sequencing. This is described in the Supporting Information (Line 141-142).
Q7: Where does the classification of EMT-derived stromal cells come from?
EMT is a common and crucial biological event in the endometrium during the implantation window. During the EMT process, epithelial cells lose their epithelial characteristics while gaining migratory and invasive properties of fibroblasts.
This cluster of cells expresses both epithelium markers CDH1 and EPCAM, and specifically expresses high levels of the EMT-related stromal cell markers AURKB, HJURP and UBE2C. During endometrial EMT, AURKB upregulates MMP2, VEGFA/Akt/mTOR and Wnt/β-catenin/Myc pathways to induce EMT (Zhen Wang et al., Cancer Manag Res 2020). HJURP also activates Wnt/β-catenin signaling to promote EMT (Y Wei et al., Eur Rev Med Pharmacol Sci 2019, Tianchi Chen et al., Int J Biol Sci 2019). UBE2C is upregulated by estrogen to promote EMT (Yan Liu et al., Mol Cancer Res 2020). Therefore, this cluster was defined as "EMT-derived stromal cells”.
Q8: In the endometrial receptivity test (ERT), endometrium sample data matches with prereceptive endometrium and WOI organoids data matches with a receptive endometrium, but why there is no information about CTRL and SEC organoids?
We performed ERT on these samples at a time when our hospital has a cooperative project with Yikon Genomics (Jiangsu, China). However, only endometrium and WOI organoids were sent for testing due to the limited quotas. Considering the end of cooperation and batch effect, no more CTRL and SEC organoids were tested. Moreover, the current ERT is a machine learning model based on the sequencing data of endometrium samples. But there are still differences in cellular composition between endometrial organoids and endometrium. Thus, the results need to be interpreted in conjunction with other results.
Q9: When analysing the transcriptome and proteome, some comparisons are made between WOI vs CTRL and SEC, or just WOI vs CTRL. It would be interesting to have all the comparisons since the power of WOI organoids lies in their differences with SEC organoids.
Thanks for your suggestion. At the organoid level, the differences in transcriptome and proteome between SEC and WOI organoids are not significant. This is understandable because WOI organoids are further induced towards the implantation window based on the secretory phase (i.e. SEC organoids), which prompted us to continue exploring at the single-cell level.
Q10: Electron microscopy comparisons with respect to pinopods, cilia, and microvilli are only performed between WOI and CTRL. It would be interesting to check it with SEC.
We now quantitatively compared the presence of various characteristic structure like microvilli, cilia, pinopodes and glycogen in the CTRL, SEC and WOI organoids. It was found that WOI organoid had longer microvilli and increased cilia, glycogen, and pinopodes (Figure 2H).
Q11: Line 190 states that pinopods are arranged more densely in WOI organoids than in CTRL organoids. Seems to be a subjective observation. Is there an objective method to quantify this?
We agree with the reviewer’s suggestion and quantified the pinopodes. The CTRL, SEC and WOI organoids were found to have increasing numbers of pinopodes, with WOI organoid owning the most abundant pinopodes. (Figure 2H) (Line184-186)
Q12: Some characteristics are very similar between WOI and SEC organoids (such as the accumulation of secretory epithelium or decreased proliferative epithelium, the increased ciliated epithelium after hormonal treatment, or the presence of EMT-derived stromal cells). The authors should complement the discussion by objectively justifying the use of WOI versus SEC organoids. Would they be useful in more specific cases or at a general level when studying implementation?
Thanks for your comments. WOI organoids are differentiated from SEC organoids towards the implantation window. Therefore, WOI organoids are suitable for studying periimplantation physiological changes or exploring pathological mechanisms. SEC organoids can be used when studying only a range of pathological problems such as endometrial secretory phase changes or hormone reactivity. (Line 365-368)
Q13:ExM media is described in Table S1, but it does not include the concentration of the different reagents in the culture medium, which is the most interesting data about the ExM medium.
Thanks to your suggestions. The concentrations of all medium components have now been refined in Table S1.
Q14: It is not specified which organoid pass is used in each experiment. Is it always the same pass?
Our experiments were conducted using P1~P3 generation endometrial organoids, as specified in the “Supporting Information” Line 54~55.
Q15: As a protocol for freezing organoids is included in materials and methods, do the authors use freshly cultured organoids or do they cryopreserve them and thaw them for culturing?
Thanks for your question. We used freshly cultured organoids in the manuscript. We listed the freezing protocol to illustrate that the constructed organoids can be frozen and recovered for special experimental needs and the establishment of sample banks.
Q16: The most important point: Neither of the two studies that developed human endometrial organoids from tissue biopsies (Boretto et al. 2017 and Turco et al. 2017), observed stromal cell growth in culture. They disappeared between the first and second pass (as indicated by Turco et al. 2017). How do the authors justify the presence of stromal cells in their organoid culture if they rely on the protocols previously described by these research groups? If it is the case that they can only use the initial pass (freshly planted cells from endometrium), it does not make sense to include the freezing of the different passes in materials and methods, since the expansion capacity of the culture would be lost, which implies a major limitation of the model.
Thanks for your question.
(1) We did not completely follow the protocols of these research groups. To maximize the recovery of both epithelial and stromal cells, we optimized key steps such as tissue digestion and cell strainer filtration. We shortened the digestion time to 20 minutes to protect cells from the digestion solution and retain some cell aggregates, which are beneficial for maintaining cell stemness and preserving stromal and immune cells cluster. The 40 μm filter membrane was used to isolate the endometrial cells, which may acquire both epithelial, and stromal cells.
(2) Our experiments were conducted using P1~P3 generation of freshly constructed organoids. However, we also used recovered organoids when fresh endometrial samples were not available due to the COVID-19 epidemic. It was found that the organoids (e.g., P0~P5) still exhibited vigorous growth condition after recovery and could continue to be cultured by passaging (shown as below).
The recovered organoids can be used for special experiments and biobank establishment.
Author response image 10.
The endometrial organoids of different passages were observed before cryopreservation and after recovery. Scale bar = 200 μm.
Q17: It is not clear which organoids include Figure S2F. Does it include the three types of organoids or just WOI organoids?
This circle diagram showed the functions of upregulated genes in the WOI group compared to CTRL group from combined transcriptome and proteome analysis, which has been labeled in the figure legend section.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
Therapeutic treatments for congenital and acquired craniofacial (CF) bone abnormalities are not well developed. This study provides convincing evidence for an innovative regenerative treatment for pediatric craniofacial bone loss using Jagged1-PEG-MAL hydrogel with pediatric human bone cells. The report is a valuable advance in this field.
-
Reviewer #1 (Public Review):
Summary:
In this manuscript, the authors conducted an important study that explored an innovative regenerative treatment for pediatric craniofacial bone loss, with a particular focus on investigating the impacts of JAGGED1 (JAG1) signaling.
Strengths:
Building on their prior research involving the effect of JAG1 on murine cranial neural crest cells, the authors demonstrated successful bone regeneration in an in vivo murine bone loss model with a critically-sized cranial defect, where they delivered JAG1 with pediatric human bone-derived osteoblast-like cells in the hydrogel. Additionally, their findings unveiled a crucial mechanism wherein JAG1 induces pediatric osteoblast commitment and bone regeneration through the phosphorylation of p70 S6K. This discovery offers a promising avenue for potential treatment, involving targeted delivery of JAG1 and activation of downstream p70 s6K, for pediatric craniofacial bone loss. Overall, the experimental design is appropriate, and the results are clearly presented.
-
Reviewer #2 (Public Review):
The current manuscript undoubtedly demonstrates that JAG1 can induced osteogenesis via non-canonical signaling. In fact, using the mouse-calvarial critical defect model, the authors have clearly shown the anabolic regenerative effect of JAG1 in via non-canonical pathways. Exploring the molecular mechanisms, the authors have shown that non-canonically JAG1 is regulating multiple pathways including STAT5, AKT, P38, JNK, NF-ĸB, and p70 S6K, which together possibly culminate to the activation of p70 S6K. In summary these findings have significant implications in designing new approaches for bone regenerative research.
-
Author response:
The following is the authors’ response to the original reviews.
Reviewer #1 (Recommendations For The Authors):
Major comments:
(1) Regarding the cell studies of human pediatric bone-derived osteoblast-like cells (HBO), the authors should provide a rationale for their selection of specific cell lines (15,16, 17, 19, 20, 23, 24) in this study. As for animal studies, could the authors clarify which cell lines were utilized in the murine in vivo experiments?
We appreciate the opportunity to address this. To reduce confusion, we have numbered the patient primary cell lines used in these studies sequentially from 1 – 7. Additionally, we have added “HBO cell lines used for experiments were selected based on the ability of the primary cell line to proliferate and mineralize in culture” to the Methods section.
In vivo experiments: “HBO cell lines 2, 6 and 7 from separate individuals were selected for these experiments based on similar growth and passage characteristics.” This statement is included in the Methods section.
(2) In this study, the authors performed the murine in vivo experiments using both male and female mice. Could the author clarify if any difference was observed between male and female mice in the findings? This information would contribute to a more comprehensive understanding of the study.
We agree and have added the following to the Results section: “There was no sex-based difference in regenerated bone volume.”
(3) Although the histological results showed an elevated collagen expression in mice treated with BMP2, JAG1, and JAG1 + DAPT compared to those treated with the cells alone, the differences among groups were subtle. The authors should consider the immunohistochemical (IHC) staining for collagen 1 on the samples, allowing for a quantitative assessment of collagen 1 expression.
Thank you for this comment. The differences between BMP2, JAG1, and JAG1 + DAPT are indeed subtle. We have added Supplementary Figure 5, showing collagen staining of sections from the same FFPE blocks that were sectioned and stained with Masson Trichrome in Figure 2C.
Minor Comments:
(4) Please specify which cell lines are represented in the staining results shown in Fig.1A and Fig. 5A, respectively.
In Fig 1A the representative images are of HBO2. Fig 5A representative images are of HBO7. We have added this information to the figure legends for these figures.
(5) There appears to be a discrepancy in the specified size of the critical defect. The manuscript states that the size is 4mm, while Supplemental Figure 3 indicates 3.5mm.
Thank you for this catch! Yes, it should be 4mm. This has been corrected in Supplementary Figure 3.
(6) The scale bar for Figure 2 C is missing.
Scale bars have been added which also gave us an opportunity to brighten the images equally, allowing for better distinction between the different colors of the Masson Trichrome staining.
(7) In the methodological section 2.5 for JAG1 delivery, it would be helpful if the authors could review the initial dosage of JAG1 delivery to confirm if HBO cells were included or not, given that the MicroCT results indicate that all groups incorporated HBO cells.
We appreciate this suggestion. In response to another question, we have added Supplementary Figure 4 which includes an “Empty Defect” condition with no HBO cells, making the original method statement accurate.
Reviewer #2 (Recommendations For The Authors):
In the current study, using in vitro and in vivo models the authors clearly show that JAG1 can enhance osteogenesis and thus can be helpful in designing new therapeutic approaches in the field of bone regenerative research. The in vivo mouse CF model is very convincing and shows that JAG1 promotes osteogenesis via non-canonical signaling. Mechanistically it seems that JAG1 activates STAT5, AKT, P38, JNK, NF-ĸB, and p70 S6K. However, additional evidence is needed to convincingly conclude that all the non-canonical pathways activated via JAG1 converge at p70 S6K activation. The following concerns need to be addressed.
(1) In Fig 1A: Even though the Jag1-Fc shows a very significant increase in HBO mineralization, there are no significant increases in cells in osteogenic media when compared to control growth media. Even though the different conditions were subjected to RNAseq analysis in the later figures, qPCR analysis of some osteogenic genes in Figure 1 might be helpful.
We appreciate the opportunity to explore this question further. We conducted mineralization experiments in triplicate and performed qRT-PCR, assessing for gene expression of 5 osteogenic genes: ALPL, BGLAP (osteocalcin), COL1A1, RUNX2, and SP7. Results are shown in Figure 1C and this text was added to Results: “Additionally, PCR analysis of HBO1 cells from a repeat experiment collected at days 7, 14, and 21 showed significantly increased expression of osteogenic genes with JAG1-bds stimulation (Figure 1C). ALPL was significantly expressed at Day 7, with a 3.5-fold increase (p=0.0004) compared to HBO1 cells grown in growth media. In contrast, significant expression levels of COL1A1 and BGLAP were observed at 14 days, with a 5.1-fold increase (p=0.0021) of COL1A1 and a 12.3-fold increase (0.0002) of BGLAP when compared to growth media conditions. Interestingly, while some mineralization is observed in the osteogenic media and Fc-bds
(Figure 1A) conditions, there were no significant increases in osteogenic gene expression (Figure
1C). Expression of RUNX2 and SP7 was not significantly altered across all conditions and time points (not shown).”
(2) In Fig 2: even though not needed in respect to the hypothesis, was there any Control group without any cells or JAG1 beads? What were the changes in between that group and cells cells-only group?
We have not observed differences between the “Empty Defect” group and the “Cells alone” group.
We have addressed the reviewer’s comments by adding this comparison in Supplementary Figure 4.
(3) Transcriptional profiling and ELISA (Fig 3 and 4) show upregulation of NF-ĸB signaling in response to JAG1. In the discussion, the authors have referenced a previous study showing NF-ĸB as prosurvival in human OB cells. However, based on many published reports, NF-ĸB activation has been shown to inhibit OB function. Does JAG1 regulate HBO cell survival via NF-ĸB activation?
Experimenting using NF-ĸB inhibitor can be helpful to show that JAG1 mediates NF-ĸB activation is anabolic in this experimental setup.
We thank the reviewer for this excellent suggestion. We are eager to explore this new direction for our research in a subsequent study. We have added this to our future directions.
(4) Fig 5:
(A) Condition showing JAG1+ DAPT is needed to compare between JAG1 canonical and noncanonical signaling.
Thank you for pointing this out. We have added Supplementary Figure 6, which includes a dose response experiment for JAG1 + DAPT.
(B) S6K18 alone seems to be increasing OB mineralization. Is that statistically significant?
No, and we have added the statistical analysis for S6K-18 to Figure 5B.
(C) Fc alone condition seems to have a very significant increase in OB mineralization. Does Fc alone upregulate OB function?
We do see some upregulation of mineralization with Fc in vitro, which we also observed in our previous studies with mouse neural crest cells, but we have not found it to be osteogenic in vivo. We have added a statement to this effect, with references. Additionally, osteogenic gene expression was not upregulated in our in vitro mineralization experiments with Fc. See Revised Figure 1.
(D) Although overall quantification shows that S6K18 partially inhibits HBO mineralization, the representative images do not represent the quantification. Transcriptional analysis (qPCR) is required to validate these findings.
We performed qRT-PCR on cells from a repeat mineralization assay, collecting cells at 9, 14, and 21 days. We have added the following to the Results:” While inhibition of NOTCH and p70 S6K decreased mineralization in our mineralization assay, there are no statistically significant changes in gene expression for ALPL, COL1A1, or BGLAP (Supplementary Figure 7). These results suggest that the HBO cells phenotypes are maturing into osteocytes and that inhibiting p70 S6K hinders the cellular ability to mineralize but not the cell phenotype progression.”
(5) Finally, to convincingly conclude the data from Fig 5, the mouse CF model can be helpful to support the authors' claim that JAG1 acts via p70 S6K.
Thank you for this feedback. We have modified our conclusions to reflect that p70 S6K is one of the non-canonical pathways that JAG1 may be activating in bone regeneration.
Thank you very much for your consideration of our revised manuscript.
-
-
www.medrxiv.org www.medrxiv.org
-
eLife assessment
This useful manuscript describes a proteomic analysis of plasma from subjects before and after an exercise regime consisting of endurance and resistance exercise. The work identifies a putative new exerkine, CD300LG, and finds associations of this protein with aspects of insulin sensitivity and angiogenesis. The characterization remains incomplete at present. Because CD300LG may have a transmembrane domain, one possibility is that exercise causes the release of extracellular vesicles containing this protein. As this study reports associations, additional studies will be needed to establish causality. The paper will hopefully prompt further studies to more fully elucidate the underlying biology.
-
Reviewer #1 (Public Review):
Summary:
In this paper, proteomics analysis of the plasma of human subjects that underwent an exercise training regime consisting of a combination of endurance and resistance exercise led to the identification of several proteins that were responsive to exercise training. Confirming previous studies, many exercise-responsive secreted proteins were found to be involved in the extra-cellular matrix. The protein CD300LG was singled out as a potential novel exercise biomarker and the subject of numerous follow-up analyses. The levels of CD300LG were correlated with insulin sensitivity. The analysis of various open-source datasets led to the tentative suggestion that CD300LG might be connected with angiogenesis, liver fat, and insulin sensitivity. CD300LG was found to be most highly expressed in subcutaneous adipose tissue and specifically in venular endothelial cells. In a subset of subjects from the UK Biobank, serum CD300LG levels were positively associated with several measures of physical activity - particularly vigorous activity. In addition, serum CD300LG levels were negatively associated with glucose levels and type 2 diabetes. Genetic studies hinted at these associations possibly being causal. Mice carrying alterations in the CD300LG gene displayed impaired glucose tolerance, but no change in fasting glucose and insulin. Whether the production of CD300LG is changed in the mutant mice is unclear.
Strengths:
The specific proteomics approach conducted to identify novel proteins impacted by exercise training is new. The authors are resourceful in the exploitation of existing datasets to gain additional information on CD300LG.
Weaknesses:
While the analyses of multiple open-source datasets are necessary and useful, they lead to relatively unspecific correlative data that collectively insufficiently advance our knowledge of CD300LG and merely represent the starting point for more detailed investigations. Additional more targeted experiments of CD300LG are necessary to gain a better understanding of the role of CD300LG and the mechanism by which exercise training may influence CD300LG levels. One should also be careful to rely on external data for such delicate experiments as mouse phenotyping. Can the authors vouch for the quality of the data collected?
-
Reviewer #2 (Public Review):
Summary:
This manuscript from Lee-Odegard et al reports proteomic profiling of exercise plasma in humans, leading to the discovery of CD300LG as a secreted exercise-inducible plasma protein. Correlational studies show associations of CD300LG with glycemic traits. Lastly, the authors query available public data from CD300LG-KO mice to establish a causal role for CD300LG as a potential link between exercise and glucose metabolism. However, the strengths of this manuscript were balanced by the moderate to major weaknesses. Therefore in my opinion, while this is an interesting study, the conclusions remain preliminary and are not fully supported by the experiments shown so far.
Strengths:
(1) Data from a well-phenotyped human cohort showing exercise-inducible increases in CD300LG.
(2) Associations between CD300LG and glucose and other cardiometabolic traits in humans, that have not previously been reported.
(3) Correlation to CD300LG mRNA levels in adipose provides additional evidence for exercise-inducible increases in CD300LG.
Weaknesses:
(1) CD300LG is by sequence a single-pass transmembrane protein that is exclusively localized to the plasma membrane. How CD300LG can be secreted remains a mystery. More evidence should be provided to understand the molecular nature of circulating CD300LG. Is it full-length? Is there a cleaved fragment? Where is the epitope where the o-link is binding to CD300LG? Does transfection of CD300LG to cells in vitro result in secreted CD300LG?
(2) There is a growing recognition of specificity issues with both the O-link and somalogic platforms. Therefore it is critical that the authors use antibodies, targeted mass spectrometry, or some other methods to validate that CD300LG really is increased instead of just relying on the O-link data.
(3) It is insufficient simply to query the IMPC phenotyping data for CD300LG; the authors should obtain the animals and reproduce or determine the glucose phenotypes in their own hands. In addition, this would allow the investigators to answer key questions like the phenotype of these animals after a GTT, whether glucose production or glucose uptake is affected, whether insulin secretion in response to glucose is normal, effects of high-fat diet, and other standard mouse metabolic phenotyping assays.
(4) I was unable to find the time point at which plasma was collected at the 12-week time point. Was it immediately after the last bout of exercise (an acute response) or after some time after the training protocol (trained state)?
-
Author response:
The following is the authors’ response to the original reviews.
Reviewer #1 (Public Review):
Summary:
In this paper, proteomics analysis of the plasma of human subjects that underwent an exercise training regime consisting of a combination of endurance and resistance exercise led to the identification of several proteins that were responsive to exercise training. Confirming previous studies, many exercise-responsive secreted proteins were found to be involved in the extra-cellular matrix. The protein CD300LG was singled out as a potential novel exercise biomarker and the subject of numerous follow-up analyses. The levels of CD300LG were correlated with insulin sensitivity. The analysis of various open-source datasets led to the tentative suggestion that CD300LG might be connected with angiogenesis, liver fat, and insulin sensitivity. CD300LG was found to be most highly expressed in subcutaneous adipose tissue and specifically in venular endothelial cells. In a subset of subjects from the UK Biobank, serum CD300LG levels were positively associated with several measures of physical activity - particularly vigorous activity. In addition, serum CD300LG levels were negatively associated with glucose levels and type 2 diabetes. Genetic studies hinted at these associations possibly being causal. Mice carrying alterations in the CD300LG gene displayed impaired glucose tolerance, but no change in fasting glucose and insulin. Whether the production of CD300LG is changed in the mutant mice is unclear.
Strengths:
The specific proteomics approach conducted to identify novel proteins impacted by exercise training is new. The authors are resourceful in the exploitation of existing datasets to gain additional information on CD300LG.
Weaknesses:
While the analyses of multiple open-source datasets are necessary and useful, they lead to relatively unspecific correlative data that collectively insufficiently advance our knowledge of CD300LG and merely represent the starting point for more detailed investigations. Additional more targeted experiments of CD300LG are necessary to gain a better understanding of the role of CD300LG and the mechanism by which exercise training may influence CD300LG levels. One should also be careful to rely on external data for such delicate experiments as mouse phenotyping. Can the authors vouch for the quality of the data collected.
Thank you for the valuable feedback on our manuscript. We recognize concerns about the specificity of correlative data from open-source datasets and the limitations it presents for understanding CD300LG's role. To address this, we have expanded the manuscript with a paragraph in the discussion regarding the need of targeted experiments confirm CD300LG’s functions and relationship with glucose metabolism. We also emphazise caution regarding external data reliance and we acknowledge the need for generating primary data including direct phenotyping of mice with CD300LG gene alterations to better understand its regulatory mechanisms and effects on glucose tolerance. Please see lines 446-456.
Reviewer #2 (Public Review):
Summary:
This manuscript from Lee-Odegard et al reports proteomic profiling of exercise plasma in humans, leading to the discovery of CD300LG as a secreted exercise-inducible plasma protein. Correlational studies show associations of CD300LG with glycemic traits. Lastly, the authors query available public data from CD300LG-KO mice to establish a causal role for CD300LG as a potential link between exercise and glucose metabolism. However, the strengths of this manuscript were balanced by the moderate to major weaknesses. Therefore in my opinion, while this is an interesting study, the conclusions remain preliminary and are not fully supported by the experiments shown so far.
Strengths:
(1) Data from a well-phenotyped human cohort showing exercise-inducible increases in CD300LG.
(2) Associations between CD300LG and glucose and other cardiometabolic traits in humans, that have not previously been reported.
(3) Correlation to CD300LG mRNA levels in adipose provides additional evidence for exercise-inducible increases in CD300LG.
Weaknesses:
(1) CD300LG is by sequence a single-pass transmembrane protein that is exclusively localized to the plasma membrane. How CD300LG can be secreted remains a mystery. More evidence should be provided to understand the molecular nature of circulating CD300LG. Is it full-length? Is there a cleaved fragment? Where is the epitope where the o-link is binding to CD300LG? Does transfection of CD300LG to cells in vitro result in secreted CD300LG?
(2) There is a growing recognition of specificity issues with both the O-link and somalogic platforms. Therefore it is critical that the authors use antibodies, targeted mass spectrometry, or some other methods to validate that CD300LG really is increased instead of just relying on the O-link data.
(3) It is insufficient simply to query the IMPC phenotyping data for CD300LG; the authors should obtain the animals and reproduce or determine the glucose phenotypes in their own hands. In addition, this would allow the investigators to answer key questions like the phenotype of these animals after a GTT, whether glucose production or glucose uptake is affected, whether insulin secretion in response to glucose is normal, effects of high-fat diet, and other standard mouse metabolic phenotyping assays.
(4) I was unable to find the time point at which plasma was collected at the 12-week time point. Was it immediately after the last bout of exercise (an acute response) or after some time after the training protocol (trained state)?
We acknowledge the importance of understanding the molecular form of CD300LG in circulation. We have expanded the discussion with a paragraph regarding the need of follow-up experiments on whether circulating CD300LG is full-length or a cleaved fragment, to identify the epitope for O-link binding, and assess CD300LG secretion in vitro through transfection experiments. We also discuss the need of targeted mass spectrometry and antibody-based validation of O-link measurements of CD300LG, and the need for more validation experiments on CD300LG-deficient mice. Please see lines 446-456.
The plasma collected post-intervention is in a state that reflects the new baseline trained condition of the subjects, 3 days after the last exercise session during the intervention. We have clarified this in our manuscript. The information is updated in line 491-493.
Reviewer #1 (Recommendations For The Authors):
In the present form, the paper raises interest in the potential role of CD300LG in the response to exercise training but unfortunately does not provide clear answers. The authors should focus their efforts on firmly validating the status of CD300LG as an exercise biomarker in humans and carefully examine the function of CD300LG through mechanistic and animal-based studies.
The authors are encouraged to acquire CD300LG-deficient mice and perform specific experiments to validate hypotheses forthcoming from the analysis of the open-source datasets. In addition, it needs to be validated that the cd300lgtm1a(KOMP)Wtsi mice are actually deficient in CD300LG. It is not uncommon that Tm1a mice have (almost) normal expression of the targeted gene.
We have now revised the manuscript and added a new section to the discussion regarding the limitations with open-source data, cd300lgtm1a(KOMP)Wtsi mice and the need for more validation experiments on CD300LG-deficient mice. Please see lines 446-456.
The value of the correlative data presented in Figure 5 is rather limited. The same can be argued for the data presented in Supplementary Figure 2. If CD300LG is expressed in endothelial cells, it stands to reason that its expression is correlated with angiogenesis. Hence, this observation does not really carry any additional value.
We agree that correlations cannot imply causality. However, similar patterns were observed in several tissues and across different data sets, which at least suggest a role CD300LG related to angiogesis. We have included a section in the discussion were we clarify that our observations should only be regarded as indications and that follow-up studies are needed to confirm any causal role for CD300LG on angiogenesis/oxidativ capacity. Please see lines 446-456.
Figure 6 may be better accommodated in the supplement.
Figure 6 is now moved to the supplement.
Figure 3A and B are a bit awkward. The description "no overlap" is confusing. Isn't it more accurate to say "no enrichment" or "no over-representation"? There will always be some overlap with certain pathways. However, there may be no enrichment. Furthermore, the use of arrows to indicate No overlap is visually not very appealing. Maybe the numbers can be given a specific color?
We have now removed the arrows and text, and rather stated in the text that there were no enrichements other than for the proteins down-regulated in the overweight group.
The description of the figure legend of figure 5E-H is incomplete.
The description is now completed.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This study provides useful insights into inter- and intra-site B cell receptor repertoire heterogeneity, noting that B cell clones from the tumour interact more with their draining lymph node than with the blood and that there is less mutation/expansion/activation of B cell clones in tumours. Unfortunately, the main claims are incomplete and only partially supported. The work could be of interest to an audience including medical biologists/immunologists and computational biologists across cancer specialities.
-
Reviewer #3 (Public Review):
In multiple cancers, the key roles of B cells are emerging in the tumor microenvironment (TME). The authors of this study appropriately introduce that B cells are relatively under-characterised in the TME and argue correctly that it is not known how the B cell receptor (BCR) repertoires across tumor, lymph node and peripheral blood relate. The authors therefore supply a potentially useful study evaluating the tumor, lymph node and peripheral blood BCR repertoires and site-to-site as well as intra-site relationships. The authors employ sophisticated analysis techniques, although the description of the methods is incomplete.
Major strengths:
(1) The authors provide a unique analysis of BCR repertoires across tumor, dLN, and peripheral blood. The work provides useful insights into inter- and intra-site BCR repertoire heterogeneity. While patient-to-patient variation is expected, the findings with regard to intra-tumor and intra-dLN heterogeneity with the use of fragments from the same tissue are of importance, contribute to the understanding of the TME, and will inform future study design.
(2) A particular strength of the study is the detailed CDR3 physicochemical properties analysis which leads the authors to observations that suggest a less-specific BCR repertoire of TIL-B compared to circulating B cells.
Concerns and comments on current version:
The revision has improved the manuscript but, in my opinion, remains inadequate. While most of my requested changes have been made, I do not see an expansion of Fig1A legend to incorporate more details about the analysis. Lacking details of methodology was a concern from all reviewers. Similarly, the 'fragmented' narrative was a concern of all reviewers. These matters have not been dealt with adequately enough - there are parts of the manuscript which remain fragmented and confusing. The narrative and analysis does not explain how the plasma cell bias has been dealt with adequately and in fact is simply just confusing. There is a paragraph at the beginning of the discussion re the plasma cell bias, which should be re-written to be clearer and moved to have a prominent place early in the results. Why are these results not properly presented? They are key for interpretation of the manuscript. Furthermore, the sorted plasma cell sequencing analysis also has only been performed on two patients. Another issue is that some disease cohorts are entirely composed of patients with metastasis, some without but metastasis is not mentioned. Metastasis has been shown to impact the immune landscape.
A reviewer brought up a concern about the overlap analysis and I also asked for an explanation on why this F2 metric chosen. Part of the rebuttal argues that another metric was explored showing similar results, thus conclusion reached is reasonable. Remarkably, these data are not only omitted from the manuscript, but is not even provided for the reviewers.
This manuscript certainly includes some interesting and useful work. Unfortunately, a comprehensive re-write was required to make the work much clearer and easier to understand and this has not been realised.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public Review):
Summary:
The authors attempt to fully characterize the immunoglobulin (Ig) heavy (H) chain repertoire of tumor-infiltrating B cells from three different cancer types by identifying the IgH repertoire overlap between these, their corresponding draining lymph nodes (DLNs), and peripheral B cells. The authors claim that B cells from tumors and DLNs have a closer IgH profile than those in peripheral blood and that DLNs are differentially involved with tumor B cells. The claim that tumor-resident B cells are more immature and less specific is made based on the characteristics of the CDR-H3 they express.
Strengths:
The authors show great expertise in developing in-house bioinformatics pipelines, as well as using tools developed by others, to explore the IgH repertoire expressed by B cells as a means of better characterizing tumor-associated B cells for the future generation of tumor-reactive antibodies as a therapy.
Weaknesses:
This paper needs major editing, both of the text and the figures, because as it stands it is convoluted and extremely difficult to follow. The conclusions reached are often not obvious from the figures themselves. Sufficient a priori details describing the framework for their analyses are not provided, making the outcome of their results questionable and leaving the reader wondering whether the findings are on solid ground.
The authors are encouraged to explain in more detail the premises used in their algorithms, as well as the criteria they follow to define clonotypes, clonal groups, and clonal lineages, which are currently poorly defined and are crucial elements that may influence their results and conclusions.
In response to this comment, we significantly expanded the paragraph dedicated to the tumor and non-tumor repertoire overlap and isotype composition. The following sections were added:
First, we characterized the relative similarity of IGH repertoires derived from tumors, DLN, and PBMC on the individual CDR-H3 clonotype level. We define clonotype as an instance with an identical CDR-H3 nucleotide sequence and identical V- and J- segment attribution (isotype attribution may be different). Unlike other authors, here we do not pool together similar CDR-H3 sequences to account for hypermutation. (Hypermutation analysis is done separately and defined as clonal group analysis. )
As overlap metrics are dependent on overall repertoire richness, we normalized the comparison using the same number of top most frequent clonotypes of each isotype from each sample (N = 109). Repertoire data for each sample were split according to the immunoglobulin isotype, and the F2 metric was calculated for each isotype separately and plotted as an individual point.
We also analyzed D metric, which represents the relative overlap diversity uninfluenced by clonotype frequency (Dij\=dij/(di*dj), where dij is the number of clonotypes present in both samples, while di and dj are the diversities of samples i and j respectively). The results for D metric are not shown, as they indicate a similar trend to that of F2 metric. This observation allows us to conclude that tumor IGH repertoires are more similar to the repertoires of lymph nodes than to those of peripheral blood, both if clonotype frequency is taken into account, and when it is not.
Having excluded the IGHD gene segment from some of their analyses (at least those related to clonal lineage inference and phylogenetic trees), it is not well explained which region of CDR-H3 is responsible for the charge, interaction strength, and Kidera factors, since in some cases the authors mention that the central part of CDR-H3 consists of five amino acids and in others of seven amino acids.
We considered different ways of calculating amino acid properties of CDR3 and used different parameters for sample-average and individual-sequence CDR3s. Now plots for Fig S6 C are updated for consistency and the parameters depicted there are now calculated using 5 central amino acids, as in other sections.
How can the authors justify that the threshold for CDR-H3 identity varies according to individual patient data?
Ideal similarity threshold may depend on several factors, such as sampling, sequencing depth etc. For example, imagine a sample picking up 100% of the clonal lineage sequences which differ only 1 amino acid from each other, and a worse quality sample/sequencing picking up only every other sequence. Obviously, the minimal threshold required to accumulate these into a cluster/clonal group would be different for these two cases (1aa for the former, and ~2 aa for the latter for single-linkage clustering). Or, in other words, the more the sequencing depth, the more dense the clusters will be. The method of individual threshold tailoring relies on the following: https://changeo.readthedocs.io/en/latest/examples/cloning.html
Although individual kidera factors that are significant in the context of our analysis are described in the text one by one on their first appearance, we now also added a sentence to describe Kidera factor analysis in general (page 8):
Kidera factors are a set of scores which quantify physicochemical properties of protein sequences (Nakai et al. 1988). 188 physical properties of the 20 amino acids are encoded using dimension reduction techniques.
Throughout the analyses, the reasons for choosing one type of cancer over another sometimes seem subjective and are not well justified in the text.
Whenever possible, we pooled all patients with all cancer types together, because the number of available samples did not allow us to draw any significant conclusions comparing between individual cancer types. When analyzing and showing individual patient data, we also did not attempt to depict any cancer-type-specific findings, but it is inevitable that we name a specific cancer type when labelling a sample coming from a specific tumor.
Overall, the narrative is fragmented. There is a lack of well-defined conclusions at the end of the results subheadings.
In addition to the described above, a conclusion was added to the paragraph describing hypermutation analysis:
IGHG clonotypes from lung cancer samples show higher number of hypermutations, possibly reflecting high mutational load found in lung cancer tissue. For melanoma, another cancer known for high mutational load, no statistically significant difference was found. This may be due to higher variance between melanoma samples, which hinders the analysis, or due to the small sample size.
The exact same paragraph is repeated twice in the results section.
Corrected.
The authors have also failed to synchronise the actual number of main figures with the text, and some panels are included in the main figures that are neither described nor mentioned in the text (Venn diagram Fig. 2A and phylogenetic tree Fig. 5D). Overall, the manuscript appears to have been rushed and not thoroughly read before submission.
Corrected.
Reviewers are forced to wade through, unravel, and validate poorly explained algorithms in order to understand the authors' often bold conclusions.
We hope that the aforementioned additions to the text and also addition to the Figure 1 make the narrative more easily understandable.
Reviewer #2 (Public Review):
Summary:
The authors sampled the B cell receptor repertoires of Cancers, their draining lymph nodes, and blood. They characterized the clonal makeup of all B cells sampled and then analyzed these clones to identify clonal overlap between tissues and clonal activation as expressed by their mutation level and CDR3 amino acid characteristics and length. They conclude that B cell clones from the Tumor interact more with their draining lymph node than with the blood and that there is less mutation/expansion/activation of B cell clones in Tumors. These conclusions are interesting but hard to verify due to the under-sampling and short sequencing reads as well as confusion as to when analysis is across all individuals or of select individuals.
Strengths:
The main strength of their analysis is that they take into account multiple characteristics of clonal expansion and activation and their different modes of visualization, especially of clonal expansion and overlap. The triangle plots once one gets used to them are very nice.
Weaknesses:
The data used appears inadequate for the conclusions reached. The authors' sample size of B cells is small and they do not address how it could be sufficient. At such low sampling rates, compounded by the plasmablast bias they mention, it is unclear if the overlap trends they observe show real trends. Analyzing only top clones by size does not solve this issue. As it could be that the top 100 clones of one tissue are much bigger than those of another and that all overlap trends are simply because the clones are bigger in one tissue or the other. i.e there is equal overlap of clones with blood but blood is not sufficiently sampled given its greater diversity and smaller clones.
Regarding the number of clonotypes to be taken into account, we were limited by the B cell infiltration of tumor samples and our ability to capture their repertoire. However, we use technical replicates on the level of cell suspension to ensure that at least top clonotypes are consistently sampled. So, this is how the data should be interpreted - as describing the most abundant clones in the repertoire (which also may be considered the most functionally relevant in case of tumor infiltrating lymphocytes).
To analyze the repertoire overlap, we generally use the F2 metric that takes clone size into account - because we think that clone size is an important functional factor. However, we have now added the description of using D metric (does not include clone frequency as a parameter) - which shows exactly the same trend as F2 metric. So, both F2 and D overlap metrics support our conclusion of higher overlap between tumor and LN.
The following text was added:
We also analyzed D metric, which represents the relative overlap diversity uninfluenced by clonotype frequency (Dij\=dij/(di*dj), where dij is the number of clonotypes present in both samples, while di and dj are the diversities of samples i and j respectively). The results for D metric are not shown, as they indicate a similar trend to that of F2 metric. This observation allows us to conclude that tumor IGH repertoires are more similar to the repertoires of lymph nodes than to those of peripheral blood, both if clonotype frequency is taken into account, and when it is not.
All in all, of course, the deeper the better, but given the data we were able to generate from the samples, this was the best approach to normalization that could be used.
Similarly, the read length (150bp X2) is too short, missing FWR1 and CDR1 and often parts of FWR2 if CDR3 is long. As the authors themselves note (and as was shown in (Zhang 2015 - PMC4811607) this makes mutation analysis difficult.
Indeed, we are aware of this problem, and therefore only a small part of the manuscript is dedicated to the hypermutation analysis. However, as the CDR-H3 region is the most mutated part, we still can capture significant diversity of mutations. To address the question of applicability of our data for the hypermutation phylogeny analysis, we compare the distribution of physico-chemical properties along the trees of hypermutation using the 150+150 and 300+300 data from the same donor and the same set of samples. The main conclusion is that neither for long, nor for short datasets could any correlation of physicochemical properties of the CDR-H3 region with the rank of the clonotype on the tree be found.
It also makes the identification of V genes and thus clonal identification ambiguous. This issue becomes especially egregious when clones are mutated.
Again, this would be important for clonotype phylogeny analysis. However, for the simple questions that we address with our clonal group analysis, such as clonal group overlap between tissues etc, we consider this data acceptable, because if any mislabelling of V segment occurs, it is a) rare and b) is equally frequent in all types of samples. Therefore, any conclusions made are still valid despite this technical drawback.
To directly address the question of mislabelling of V-genes in our data, we looked at the average number of different V-genes attributed to the same nucleotide sequence of CDR-H3 region in the short (150+150) and long (300+300) datasets from the same donor. Indeed, some ambiguity of V-gene labelling is observed (see below), but we think that it is unlikely to influence any of our cautious conclusions.
Author response image 1.
Finally, it is not completely clear when the analysis is of single individuals or across all individuals. If it is the former the authors did not explain how they chose the individuals analyzed and if the latter then it is not clear from the figures which measurements belong to which individual (i.e they are mixing measurements from different people).
We addressed this issue by adding a comment to each figure caption, describing whether a particular figure or panel describes individual or pooled data, and also whether the analysis is done on individual clonotype or clonal group level.
Also, in case pooled data were used, we added the number of patients that was pooled for a particular type of analysis. This number differs from one type of analysis to the other, because not all the patients had a complete set of tissues, and also not all samples passed a quality check for a particular analysis.
Here are the numbers listed:
Fig 2A: N=6 (we were only considering those who had all three tissues)
Fig 2C, N=14 (all)
2D: N=14 (all)
2E N=7 (have both tum and PBMC).
2F N=9 (have both tum and PBMC).
2G N=9 (have both tum and PBMC)
2H N=7 (have both tum and LN)
3A N=14 (all)
3B N=11 (only those with tumor)
3E - N=14
7F N=11 (all that have tumor)
Reviewer #3 (Public Review):
In multiple cancers, the key roles of B cells are emerging in the tumor microenvironment (TME). The authors of this study appropriately introduce that B cells are relatively under-characterised in the TME and argue correctly that it is not known how the B cell receptor (BCR) repertoires across tumors, lymph nodes, and peripheral blood relate. The authors therefore supply a potentially useful study evaluating the tumor, lymph node, and peripheral blood BCR repertoires and site-to-site as well as intra-site relationships. The authors employ sophisticated analysis techniques, although the description of the methods is incomplete. Among other interesting observations, the authors argue that the tumor BCR repertoire is more closely related to that of draining lymph node (dLN) than the peripheral blood in terms of clonal and isotype composition. Furthermore, the author's findings suggest that tumor-infiltrating B cells (TIL-B) exhibit a less mature and less specific BCR repertoire compared with circulating B cells. Overall, this is a potentially useful work that would be of interest to both medical and computational biologists working across cancer. However, there are aspects of the work that would have benefitted from further analysis and areas of the manuscript that could be written more clearly and proofread in further detail.
Major Strengths:
(1) The authors provide a unique analysis of BCR repertoires across tumor, dLN, and peripheral blood. The work provides useful insights into inter- and intra-site BCR repertoire heterogeneity. While patient-to-patient variation is expected, the findings with regard to intra-tumor and intra-dLN heterogeneity with the use of fragments from the same tissue are of importance, contribute to the understanding of the TME, and will inform future study design.
(2) A particular strength of the study is the detailed CDR3 physicochemical properties analysis which leads the authors to observations that suggest a less-specific BCR repertoire of TIL-B compared to circulating B cells.
Major Weaknesses:
The study would have benefitted from a deeper biological interpretation of the data. While given the low number of patients one can plausibly understand a reluctance to speculate about clinical details, there is limited discussion about what may contribute to observed heterogeneity.
We indeed do not want to overinterpret our data, especially where it comes to the difference between types of cancer. On the other hand, extracting similar patterns between different cancer types allows to pinpoint mechanisms that are more general and do not depend on cancer type. As for the potential source of intratumoral heterogeneity that we observe, we think that it may be coming from the selective sampling of tertiary lymphoid structures. We include IHC data for TLS detection in the supplementary Fig.5. Also, tumor mutation clonality may correlate with differential antibody response (i.e. different IGH clonotypes developing to recognize different antigens) – as has been previously described for TCRs by the lab of B.Chain in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6890490/.
For example, for the analysis of three lymph nodes taken per patient which were examined for inter-LN heterogeneity, there is a lack of information regarding these lymph nodes.
Unfortunately no clinical information about the lymph nodes was available.
'LN3' is deemed as exhibiting the most repertoire overlap with the tumor but there is no discussion as to why this may be the case.
The following phrases describes this in the “LN-to-LN heterogeneity in colorectal cancer” paragraph:
Similarly, an unequal interaction of tumors with DLNs was observed at the level of hypermutating clonal groups.
Functionally, this may again indicate that within a group of DLNs, nodes are unequal in terms of access to tumor antigens, and this inequality shapes the BCR repertoires within these lymph nodes.
(2) At times the manuscript is difficult to follow. In particular, the 'Intra-LN heterogeneity' section follows the 'LN-LN heterogeneity in colorectal cancer' section and compares the overlap of LN fragments (LN11, LN21, LN31) with the tumor in two separate patients (Fig 6A). In the previous section (LN-LN), LN11, LN21, LN31 are names given to separate lymph nodes from the same patient. The fragments are referred to as 'LN2' and the nodes in the previous section are referred to similarly. This conflation of naming for nodes and fragments is confusing.
We corrected this.
(3) There is a duplicated paragraph in 'Short vs long trees' and the following section 'Productive involvement in hypermutation lineages depends on CDR3 characteristics.
Corrected.
Reviewer #1 (Recommendations For The Authors):
- Figures:
Figure 1A lacks resolution
Corrected
Figure 2A, Venn diagram: What do the colors indicate?
Corrected
Figure 5D, why include this tree when there is no mention of it in the text?
Described
Figures 8, 9, and 10 are not to be found. One should not have to figure out that they became supplementary in the end.
Corrected
Regarding the physicochemical properties of CDR-H3, what do the authors mean by "the central part"? Do the authors refer to the CDR-H3 loop, and if so, how is that defined when the IGHD gene segment is excluded from the analyses? Is it 5 amino acids (Productive involvement in hypermutating lineages depends on CDR3 characteristics, Page 21/39 in merged document) and (CDR3 properties, Page 8/39 in merged document), or 7 amino acids (Short vs long trees phylogeny analysis, Page 19/39 in merged document)? Please clarify.
We considered different ways of calculating amino acid properties of CDR3 and used different parameters for sample-average and individual-sequence CDR3s. Now plots for Fig S6 C are updated for consistency. IGHD segment was not excluded from the analysis. The reviewer might be confused by our description of phylogenetic inference, when an artificial outgroup with D segment deleted is added to the clonal group to facilitate the inference process. All other sequences were analyzed in their original form with the D segment. This way, we could avoid biases in phylogeny introduced by misassignment of D gene germline to the outgroup.
What was the threshold for CDR-H3 identity in their analyses? How can the authors justify that this value changes according to individual patient datasets? (Materials & methods, Clonal lineage inference Page 29/39 in merged document).
As described earlier, ideal similarity threshold may depend on several factors, such as sampling, sequencing depth etc. For example, imagine a sample picking up 100% of the clonal lineage sequences which differ only 1 amino acid from each other, and a worse quality sample/sequencing picking up only every other sequence. Obviously, the minimal threshold required to accumulate these into a clonotype would be different for these two cases (1aa for the former, and ~2 aa for the latter for single-linkage clustering). The method of individual threshold tailoring relies on this: https://changeo.readthedocs.io/en/latest/examples/cloning.html
What is the difference between tumor-induced and tumor-infiltrating B cells? How can the authors discriminate between the two? Page 6/39 in the merged document.
corrected to tumor-infiltrating
"Added nucleotides" meaning N additions? Page 3/39 in the merged document.
yes
How many cancer patients were enrolled? 17 or 14(Materials & methods page 27/39 in the merged document)? Please clarify.
In the current project 14 patients were enrolled. The appropriate changes have been introduced in the final text. Supplementary table 2 has been added with the patient data.
Abbreviations are used without full descriptions.
According to reviewer’s recommendation, a list of abbreviations was added in the manuscript, and also full descriptions were added in the text upon first mentioning of the term.
Use either CDR3 or CDR-H3
We corrected the text to use CDR-H3 abbreviation throughout the text.
Reviewer #2 (Recommendations For The Authors):
I would like to start by apologizing for the time it took me to review.
As I mentioned above there are issues with the clonal sampling of the sequencing length and the statistics in this paper. From reading the paper I am not sure if they are fixable but there are some things that could be tried.
(1) The authors mention the diversity of their individual analysis - 17 individuals across 3 cancer types, but do not then systematically show us how the different things they measure track across the different individuals and cancer types. it is possible that some trends would be more convincing if we saw them happening again and again across all individuals. But, as I said above, the authors do not identify individuals clearly across all their types of analysis nor do they explain why sometimes they show analysis of specific individuals.
For overlap analysis (Fig. 2 except panel B), CDR3 properties analysis (Fig. 3, Fig. S7), clonal group analysis (Fig. 4) we used pooled data on all cancers, unless it is indicated otherwise on the panel. For overlap analysis, we used Cytoscape graph (Fig. 2B) for one patient, mp3, to illustrate the findings that were made on pooled data. For other types of analysis, such as overlap between individual lymph nodes, or tumor fragments (Fig. 5, 6, 7 except panel F) pooled analysis is not possible due to the individual nature of the processes in question.
(2) The authors do not address how lacking their sampling is nor the distribution of clone sizes in different tissues/ individuals/ subsets. Without such a discussion it is not clear how tenuous or convincing their conclusions are.
(3) The short sequencing lengths limit the ability to exactly identify V and thus the germline root of clones, whose positions are mutated and clonal association of sequences. The authors appear to be aware of this as they often use the most common ancestor as the start of their analysis... however, again there are inconsistencies that are not clearly described in the text. in creating trees with change they defined roots as the putative germline and at least in most cases also in clone association although in some analyses potentially similar clones were collapsed into clonotypes. Again it is not clear when one method was used or the other and how the choice was made what to choose.
Here we can only state that we consistently used the approach described in the Methods section, which was the following:
First, the repertoires were clustered into clonal lineages using the criteria described in “Methods: Clonal lineage inference” Assuming that each clonotype sequence in the clonal lineage originated from the same ancestor, we try to recover the phylogeny. Please note that we refer to the individual BCR sequences as “clonotypes”, and to a group of clonotypes that presumably share a common ancestor - as “clonal lineage” or “clonal group”.
The phylogeny of B-cell hypermutations was inferred for each clonal lineage of size five or more using the maximum likelihood method and the GTR GAMMA nucleotide substitution model. To find the most recent common ancestor (MRCA) or “root” of the tree, we used an artificial outgroup constructed as a conjugate of germline segments V and J defined by MIXCR and added it to the clonal lineage. The D segment was excluded from the outgroup formation, as there was insufficient confidence in the germline annotations due to its short length and high level of mutations. The rest of the clonotypes were still analyzed in their original form with D segment in place. Deleting D segment from the outgroup simply eliminates the risk of biasing the phylogeny by missasigning D segment germline sequence to the outgroup. The MUSCLE tool was used for multiple sequence alignment and RAxML software was used to build and root phylogenetic trees.
(4) Beyond the statistical issues mentioned above: the unclear selection of individual examples for comparison and significance testing, the mixing of individuals and cancer types without clear identification, etc. there is in general a lack of coherence in the statistical analysis performed. specifically:
(a) the authors should choose one cutoff for significance (0.01 for instance) and then just mention when things are significant and when not. There is no need and it is confusing to add the p-value for every comparison. P-values are not good measures of effect size.
We corrected the figures and left p-values only where they are below significance threshold.
(b) the Bonferroni correction used is not well characterized. For an alpha of 0.01 in Figures 3 C and D how many tests were performed?
The number of tests performed that was used for Bonferroni-Holm correction equals the number of comparisons on the heatmap which makes it 39 for each heatmap on Fig 3C and 13 for Fig 3D.
Finally some minor issues -
(1) Not all acronyms are described, for instance, TME and TIL. The first time any acronym is used it should be spelled out. -> Katya B- список сокращений
(2) The figure captions are not all there...
(a) there is no caption for Figure 3E.
corrected
(b) there are Figure 7 F and G panels but no Figure 7E panel and Figure F is described after Figure G.
corrected
(3) A few problems with wording -
(a) bottom paragraph of page 3 - instead of :
"different lymph nodes from one draining lymph node pool may be more or less involved"
Corrected to "different lymph nodes from one draining lymph node pool may be differentially involved"
(b) figure caption for figure 3a: instead of:
"CDR3 are on average significantly higher in tumor"
Corrected to "CDR3 are on average significantly longer in tumor"
Reviewer #3 (Recommendations For The Authors):
- FIG1A - Suggest expanding the legend to include more information on the computational analyses.
added
- PAGE SIX: Suggest adding a table or some text on patient characteristics. Numbers of unique clonotypes per sample etc. Are there differences in age/sex that need to be considered? Some clonotype information is available in S1 but some summary and statistics would be appreciated.
Added patient information as Supplementary table 2.
- PAGE SIX: F2 Metric, suggestion to explain why this was used vs. other metrics.
We expanded the following paragraph to include information about F2 metric and D metric, and the reason why we are using F2.
Repertoire data for each sample were split according to the immunoglobulin isotype, and the F2 metric was calculated for each isotype separately and plotted as an individual point. We used the repertoire overlap metric F2 (Сlonotype-wise sum of geometric mean frequencies of overlapping clonotypes), which accounts for both the number and frequency of overlapping clonotypes (Fig. 2A). As expected, significantly lower overlaps were observed between the IGH repertoires of peripheral blood and tumors compared to LN/tumor overlaps. The LN/PBMC overlap also tended to be lower, but the difference was not statistically significant. We also analyzed D metric, which represents the relative overlap diversity uninfluenced by clonotype frequency (Dij\=dij/(di*dj), where dij is the number of clonotypes present in both samples, while di and dj are the diversities of samples i and j respectively). The results for D metric are not shown, as they indicate a similar trend to that of F2 metric. This observation allows us to conclude that tumor IGH repertoires are more similar to the repertoires of tumor-draining LNs than to those of peripheral blood, both if clonotype frequency is taken into account, and when it is not.
- PAGE SIX: Make clear in the text that mp3 is a patient.
Added “melanoma patient mp3”
- PAGE EIGHT: Suggest explaining kidera factors at first use - not all readers will know what they are.
We expanded the following paragraph to add more information about Kidera factors:
To explore CDR-H3 physicochemical properties, we calculated the mean charge, hydropathy, predicted interaction strength, and Kidera factors 1 - 9 (kf1-kf9) for five central amino acids of the CDR-H3 region for the 100 most frequent clonotypes of each sample using VDJtools. Kidera factors are a set of scores which quantify physicochemical properties of protein sequences 61. 188 physical properties of the 20 amino acids are encoded using dimension reduction techniques, to yield 9 factors which are used to quantitatively characterize physicochemical properties of amino acid sequences.
- Fig 5D is not referred to.
Corrected
-
-
www.biorxiv.org www.biorxiv.org
-
Reviewer #1 (Public Review):
Summary:
Kroeg et al. describe a novel method for 2D culture human induced pluripotent stem cells (hiPSCs) to form cortical tissue in a multiwell format. The method claims to offer a significant advancement over existing developmental models. Their approach allows them to generate cultures with precise, reproducible dimensions and structure with a single rosette; consistent geometry; incorporating multiple neuronal and glial cell types (cellular diversity); avoiding the necrotic core (often seen in free-floating models due to limited nutrient and oxygen diffusion). The researchers demonstrate the method's capacity for long-term culture, exceeding ten months, and show the formation of mature dendritic spines and considerable neuronal activity. The method aims to tackle multiple key problems of in vitro neural cultures: reproducibility, diversity, topological consistency, and electrophysiological activity. The authors suggest their potential in high-throughput screening and neurotoxicological studies.
Strengths:
The main advances in the paper seem to be: The culture developed by the authors appears to have optimal conditions for neural differentiation, lineage diversification, and long-term culture beyond 300 days. These seem to me as a major strength of the paper and an important contribution to the field. The authors present solid evidence about the high cell type diversity present in their cultures. It is a major point and therefore it could be better compared to the state of the art. I commend the authors for using three different IPS lines, this is a very important part of their proof. The staining and imaging quality of the manuscript is of excellent quality.
Weaknesses:
(1) The title is misleading: The presented cultures appear not to be organoids, but 2D neural cultures, with an insufficiently described intermediate EB stage. For nomenclature, see: doi: 10.1038/s41586-022-05219-6. Should the tissue develop considerable 3D depth, it would suffer from the same limited nutrient supply as 3D models - as the authors point out in their introduction.
(2) The method therefore should be compared to state-of-the-art (well-based or not) 2D cultures, which seems to be somewhat overlooked in the paper, therefore making it hard to assess what the advance is that is presented by this work.
(3) Reproducibility is prominently claimed throughout the manuscript. However, it is challenging to assess this claim based on the data presented, which mostly contain single frames of unquantified, high-resolution images. There are almost no systematic quantifications presented. The ones present (Figure S1D, Figure 4) show very large variability. However, the authors show sets of images across wells (Figure S1B, Figure S3) which hint that in some important aspects, the culture seems reproducible and robust.
(4) What is in the middle? All images show markers in cells present around the center. The center however seems to be a dense lump of cells based on DAPI staining. What is the identity of these cells? Do these cells persist throughout the protocol? Do they divide? Until when? Addressing this prominent cell population is currently lacking.
(5) This manuscript proposes a new method of 2D neural culture. However, the description and representation of the method are currently insufficient.<br /> (a) The results section would benefit from a clear and concise, but step-by-step overview of the protocol. The current description refers to an earlier paper and appears to skip over some key steps. This section would benefit from being completely rewritten. This is not a replacement for a clear methods section, but a section that allows readers to clearly interpret results presented later.<br /> (b) Along the same lines, the graphical abstract should be much more detailed. It should contain the time frames and the media used at the different stages of the protocol, seeding numbers, etc.
-
Reviewer #2 (Public Review):
Summary:
In this manuscript, van der Kroeg et al have developed a method for creating 3D cortical organoids using iPSC-derived neural progenitor cells in 384-well plates, thus scaling down the neural organoids to adherent culture and a smaller format that is amenable to high throughput cultivation. These adherent cortical organoids, measuring 3 x 3 x 0.2 mm, self-organize over eight weeks and include multiple neuronal subtypes, astrocytes, and oligodendrocyte lineage cells.
Strengths:
(1) The organoids can be cultured for up to 10 months, exhibiting mature dendritic spines, axonal myelination, and robust neuronal activity.
(2) Unlike free-floating organoids, these do not develop necrotic cores, making them ideal for high-throughput drug discovery, neurotoxicological screening, and brain disorder studies.
(3) The method addresses the technical challenge of achieving higher-order neural complexity with reduced heterogeneity and the issue of necrosis in larger organoids. The method presents a technical advance in organoid culture.
(4) The method has been demonstrated with multiple cell lines which is a strength.
(5) The manuscript provides high-quality immunostaining for multiple markers.
Weaknesses:
(1) Direct head-to-head comparison with standard organoid culture seems to be missing and may be valuable for benchmarking, ie what can be done with the new method that cannot be done with standard culture and vice versa, ie what are the aspects in which new method could be inferior to the standard.
(2) It would be important to further benchmark the throughput, ie what is the success rate in filling and successfully growing the organoids in the entire 384 well plate?
(3) For each NPC line an optimal seeding density was estimated based on the proliferation rate of that NPC line and via visual observation after 6 weeks of culture. It would be important to delineate this protocol in more robust terms, in order to enable reproducibility with different cell lines and amongst the labs.
-
Reviewer #3 (Public Review):
Summary:
Kroeg et al. have introduced a novel method to produce 3D cortical layer formation in hiPSC-derived models, revealing a remarkably consistent topography within compact dimensions. This technique involves seeding frontal cortex-patterned iPSC-derived neural progenitor cells in 384-well plates, triggering the spontaneous assembly of adherent cortical organoids consisting of various neuronal subtypes, astrocytes, and oligodendrocyte lineage cells.
Strengths:
Compared to existing brain organoid models, these adherent cortical organoids demonstrate enhanced reproducibility and cell viability during prolonged culture, thereby providing versatile opportunities for high-throughput drug discovery, neurotoxicological screening, and the investigation of brain disorder pathophysiology. This is an important and timely issue that needs to be addressed to improve the current brain organoid systems.
Weaknesses:
While the authors have provided significant data supporting this claim, several aspects necessitate further characterization and clarification. Mainly, highlighting the consistency of differentiation across different cell lines and standardizing functional outputs are crucial elements to emphasize the future broad potential of this new organoid system for large-scale pharmacological screening.
-
Author response:
Public Reviews:
Reviewer #1 (Public Review):
Summary:
Kroeg et al. describe a novel method for 2D culture human induced pluripotent stem cells (hiPSCs) to form cortical tissue in a multiwell format. The method claims to offer a significant advancement over existing developmental models. Their approach allows them to generate cultures with precise, reproducible dimensions and structure with a single rosette; consistent geometry; incorporating multiple neuronal and glial cell types (cellular diversity); avoiding the necrotic core (often seen in free-floating models due to limited nutrient and oxygen diffusion). The researchers demonstrate the method's capacity for long-term culture, exceeding ten months, and show the formation of mature dendritic spines and considerable neuronal activity. The method aims to tackle multiple key problems of in vitro neural cultures: reproducibility, diversity, topological consistency, and electrophysiological activity. The authors suggest their potential in high-throughput screening and neurotoxicological studies.
Strengths:
The main advances in the paper seem to be: The culture developed by the authors appears to have optimal conditions for neural differentiation, lineage diversification, and long-term culture beyond 300 days. These seem to me as a major strength of the paper and an important contribution to the field. The authors present solid evidence about the high cell type diversity present in their cultures. It is a major point and therefore it could be better compared to the state of the art. I commend the authors for using three different IPS lines, this is a very important part of their proof. The staining and imaging quality of the manuscript is of excellent quality.
We thank the reviewer for the positive comments on the potential of our novel platform to address key problems of in vitro neural culture, highlighting the longevity and reproducibility of the method across multiple cell lines.
Weaknesses:
(1) The title is misleading: The presented cultures appear not to be organoids, but 2D neural cultures, with an insufficiently described intermediate EB stage. For nomenclature, see: doi: 10.1038/s41586-022-05219-6. Should the tissue develop considerable 3D depth, it would suffer from the same limited nutrient supply as 3D models - as the authors point out in their introduction.
We appreciate the opportunity to clarify this point. We respectfully disagree that the cultures do not meet the consensus definition of an organoid. In fact, a direct quote from the seminal nomenclature paper referenced by the reviewer states: “We define organoids as in vitro-generated cellular systems that emerge by self-organization, include multiple cell types, and exhibit some cytoarchitectural and functional features reminiscent of an organ or organ region. Organoids can be generated as 3D cultures or by a combination of 3D and 2D approaches (also known as 2.5D) that can develop and mature over long periods of time (months to years).” (Pasca et al, 2022 doi10.1038/s41586-022-05219-6). Therefore, while many organoid types indeed have a more spherical or globular 3D shape, the term organoid also applies to semi-3D or non-globular adherent organoids, such as renal (Czerniecki et al 2018, doi.org/10.1016/j.stem.2018.04.022) and gastrointestinal organoids (Kakni et al 2022, doi.org/10.1016/j.tibtech.2022.01.006). Accordingly, the adherent cortical organoids described in the manuscript exhibit self-organization to single radial structures consisting of multiple cell layers in the z-axis, reaching ~200um thickness (therefore remaining within the limits for sufficient nutrient supply), with consistent cytoarchitectural topology and electrophysiological activity, and therefore meet the consensus definition of an organoid.
(2) The method therefore should be compared to state-of-the-art (well-based or not) 2D cultures, which seems to be somewhat overlooked in the paper, therefore making it hard to assess what the advance is that is presented by this work.
It was not our intention to benchmark this model quantitatively against other culture systems. Rather, we have attempted to characterize the opportunities and limitations of this approach, with a qualitative contrast to other culture methods. Compared to state-of-the-art 2D neural network cultures, adherent cortical organoids provide distinct advantages in:
(1) Higher order self-organized structure formation, including segregation of deeper and upper cortical layers.
(2) Longevity: adherent cortical organoids can be successfully kept in culture up to 1 year where 2D cultures typically deteriorate after 8-12 weeks.
(3) Maturity, including the formation of dendritic mushroom spines and robust electrophysiological activity.
(4) Cell type diversity including a more physiological ratio of inhibitory and excitatory neurons (10% GAD67+/NeuN+ neurons in adherent cortical organoids, vs 1% in 2D neural networks) and the emergence of oligodendrocyte lineage cells.
On the other hand, limitations of adherent cortical organoids compared to 2D neural network cultures are:
(1) Culture times for organoids are much longer than for 2D cultures and the method can therefore be more laborious and more expensive.
(2) Whole cell patch clamping is not easily feasible in the organoids because of the restricting dimensions of the 384well plates.
(3) Reproducibility is prominently claimed throughout the manuscript. However, it is challenging to assess this claim based on the data presented, which mostly contain single frames of unquantified, high-resolution images. There are almost no systematic quantifications presented. The ones present (Figure S1D, Figure 4) show very large variability. However, the authors show sets of images across wells (Figure S1B, Figure S3) which hint that in some important aspects, the culture seems reproducible and robust.
We made considerable efforts to establish quantitative metrics to assess reproducibility. We applied a quantitative scoring system of single radial structures at different time points for multiple batches of all three lines as indicated in Figure S1D. This figure represents a comprehensive dataset in which each dot represents the average of a different batch of organoids containing 10-40 organoids per batch. To emphasize this, we will adapt the graph to better reflect the breadth of the dataset. Additional quantifications are given in Figure S2 for progenitor and layer markers for Line 1 and in Figure S5 for interneurons across all three lines, showing relatively low variability. That being said, we acknowledge the reviewer’s concerns and will modify the text to reduce the emphasis of this point, pending more extensive data addressing reproducibility across a wide range of parameters.
(4) What is in the middle? All images show markers in cells present around the center. The center however seems to be a dense lump of cells based on DAPI staining. What is the identity of these cells? Do these cells persist throughout the protocol? Do they divide? Until when? Addressing this prominent cell population is currently lacking.
A more comprehensive characterization of the cells in the center remains a significant challenge due to the high cell density hindering antibody penetration. However, dye-based staining methods such as DAPI and the LIVE/DEAD panel confirm a predominance of intact nuclei with very minimal cell death. The limited available data suggest that a substantial proportion of the cells in the center are proliferative neural progenitors, indicated by immunolabeling for SOX2 and Ki67. We will add additional figures to support these findings. Furthermore, we are currently optimizing the conditions to perform single cell / nuclear RNA sequencing to further characterize the cellular composition of the organoids.
(5) This manuscript proposes a new method of 2D neural culture. However, the description and representation of the method are currently insufficient. <br /> (a) The results section would benefit from a clear and concise, but step-by-step overview of the protocol. The current description refers to an earlier paper and appears to skip over some key steps. This section would benefit from being completely rewritten. This is not a replacement for a clear methods section, but a section that allows readers to clearly interpret results presented later.
We will revise the manuscript to include a more detailed step-by-step overview of the protocol.
(b) Along the same lines, the graphical abstract should be much more detailed. It should contain the time frames and the media used at the different stages of the protocol, seeding numbers, etc.
As suggested, we will also adapt the graphical abstract to include more detail.
Reviewer #2 (Public Review):
Summary:
In this manuscript, van der Kroeg et al have developed a method for creating 3D cortical organoids using iPSC-derived neural progenitor cells in 384-well plates, thus scaling down the neural organoids to adherent culture and a smaller format that is amenable to high throughput cultivation. These adherent cortical organoids, measuring 3 x 3 x 0.2 mm, self-organize over eight weeks and include multiple neuronal subtypes, astrocytes, and oligodendrocyte lineage cells.
Strengths:
(1) The organoids can be cultured for up to 10 months, exhibiting mature dendritic spines, axonal myelination, and robust neuronal activity.
(2) Unlike free-floating organoids, these do not develop necrotic cores, making them ideal for high-throughput drug discovery, neurotoxicological screening, and brain disorder studies.
(3) The method addresses the technical challenge of achieving higher-order neural complexity with reduced heterogeneity and the issue of necrosis in larger organoids. The method presents a technical advance in organoid culture.
(4) The method has been demonstrated with multiple cell lines which is a strength.
(5) The manuscript provides high-quality immunostaining for multiple markers.
We appreciate the reviewer’s acknowledgement of the strengths of this novel platform as a technical advance in organoid culture that reduces heterogeneity and shows potential for higher throughput experiments.
Weaknesses:
(1) Direct head-to-head comparison with standard organoid culture seems to be missing and may be valuable for benchmarking, ie what can be done with the new method that cannot be done with standard culture and vice versa, ie what are the aspects in which new method could be inferior to the standard.
In our opinion, it would be extremely difficult to directly compare methods because of substantial differences. Most notably, whole brain organoids grow to large and irregular globular shapes, while adherent cortical organoids have a highly standardized shape confined by the limits of a 384-well. Moreover, it was not our intention to benchmark this model quantitatively against other culture systems. Rather, we have attempted to characterize the opportunities and limitations of this approach, with a qualitative contrast to other culture methods.
(2) It would be important to further benchmark the throughput, ie what is the success rate in filling and successfully growing the organoids in the entire 384 well plate?
Figure S1D shows the success rate of organoid formation and stability of the organoid structures over time. In addition, we will add the number of wells that were filled per plate.
(3) For each NPC line an optimal seeding density was estimated based on the proliferation rate of that NPC line and via visual observation after 6 weeks of culture. It would be important to delineate this protocol in more robust terms, in order to enable reproducibility with different cell lines and amongst the labs.
Figure S1C provides the relationship between proliferation rate and seeding density, allowing estimation of seeding densities based on the proliferation rate of the NPCs. However, we appreciate the reviewers feedback and will modify the methods to provide more detail.
Reviewer #3 (Public Review):
Summary:
Kroeg et al. have introduced a novel method to produce 3D cortical layer formation in hiPSC-derived models, revealing a remarkably consistent topography within compact dimensions. This technique involves seeding frontal cortex-patterned iPSC-derived neural progenitor cells in 384-well plates, triggering the spontaneous assembly of adherent cortical organoids consisting of various neuronal subtypes, astrocytes, and oligodendrocyte lineage cells.
Strengths:
Compared to existing brain organoid models, these adherent cortical organoids demonstrate enhanced reproducibility and cell viability during prolonged culture, thereby providing versatile opportunities for high-throughput drug discovery, neurotoxicological screening, and the investigation of brain disorder pathophysiology. This is an important and timely issue that needs to be addressed to improve the current brain organoid systems.
We thank the reviewer for highlighting the strengths of our novel platform. We appreciate that all three reviewers agree that the adherent cortical organoids presented in this manuscript reliably demonstrate increased reproducibility and longevity. They also commend its potential for higher throughput drug discovery and neurotoxicological/phenotype screening purposes.
Weaknesses:
While the authors have provided significant data supporting this claim, several aspects necessitate further characterization and clarification. Mainly, highlighting the consistency of differentiation across different cell lines and standardizing functional outputs are crucial elements to emphasize the future broad potential of this new organoid system for large-scale pharmacological screening.
We appreciate the feedback and will add more detail on consistency and standardization of functional outputs.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This important study reports that a transcription factor that stimulates mRNA synthesis can stabilize its target transcripts, possibly through co-transcriptional assembly and action in the cytoplasm. While the primary observation is solid, whether an association of Sfp1 with specific transcripts in the cytoplasm is the critical step in transcript stabilization is not entirely clear. If confirmed by independent means, the authors would have found a novel mechanistic link between mRNA synthesis and cytoplasmic mRNA stability for specific transcripts. Such a finding would be of broad interest to the field of molecular biology.
-
Reviewer #2 (Public Review):
Summary:
The manuscript by Kelbert et al. presents results on the involvement of the yeast transcription factor Sfp1 in the stabilisation of transcripts whose synthesis it stimulates. Sfp1 is known to affect the synthesis of a number of important cellular transcripts, such as many of those that code for ribosomal proteins. The hypothesis that a transcription factor can remain bound to the nascent transcript and affect its cytoplasmic half-life is attractive. However, the association of Sfp1 with cytoplasmic transcripts remains to be validated, as explained in the following comments:
A two-hybrid based assay for protein-protein interactions identified Sfp1, a transcription factor known for its effects on ribosomal protein gene expression, as interacting with Rpb4, a subunit of RNA polymerase II. Classical two-hybrid experiments depend on the presence of the tested proteins in the nucleus of yeast cells, suggesting that the observed interaction occurs in the nucleus. Unfortunately, the two-hybrid method cannot determine whether the interaction is direct or mediated by nucleic acids. The revised version of the manuscript now states that the observed interaction could be indirect.
To understand to which RNA Sfp1 might bind, the authors used an N-terminally tagged fusion protein in a cross-linking and purification experiment. This method identified 264 transcripts for which the CRAC signal was considered positive and which mostly correspond to abundant mRNAs, including 74 ribosomal protein mRNAs or metabolic enzyme-abundant mRNAs such as PGK1. The authors did not provide evidence for the specificity of the observed CRAC signal, in particular what would be the background of a similar experiment performed without UV cross-linking. This is crucial, as Figure S2G shows very localized and sharp peaks for the CRAC signal, often associated with over-amplification of weak signal during sequencing library preparation.
In a validation experiment, the presence of several mRNAs in a purified SFP1 fraction was measured at levels that reflect the relative levels of RNA in a total RNA extract. Negative controls showing that abundant mRNAs not found in the CRAC experiment were clearly depleted from the purified fraction with Sfp1 would be crucial to assess the specificity of the observed protein-RNA interactions (to complement Fig. 2D). The CRAC-selected mRNAs were enriched for genes whose expression was previously shown to be upregulated upon Sfp1 overexpression (Albert et al., 2019). The presence of unspliced RPL30 pre-mRNA in the Sfp1 purification was interpreted as a sign of co-transcriptional assembly of Sfp1 into mRNA, but in the absence of valid negative controls, this hypothesis would require further experimental validation. Also, whether the fraction of mRNA bound by Sfp1 is nuclear or cytoplasmic is unclear.
To address the important question of whether co-transcriptional assembly of Spf1 with transcripts could alter their stability, the authors first used a reporter system in which the RPL30 transcription unit is transferred to vectors under different transcriptional contexts, as previously described by the Choder laboratory (Bregman et al. 2011). While RPL30 expressed under an ACT1 promoter was barely detectable, the highest levels of RNA were observed in the context of the native upstream RPL30 sequence when Rap1 binding sites were also present. Sfp1 showed better association with reporter mRNAs containing Rap1 binding sites in the promoter region. Removal of the Rap1 binding sites from the reporter vector also led to a drastic decrease in reporter mRNA levels. Co-purification of reporter RNA with Sfp1 was only observed when Rap1 binding sites were included in the reporter. Negative controls for all the purification experiments might be useful.
To complement the biochemical data presented in the first part of the manuscript, the authors turned to the deletion or rapid depletion of SFP1 and used labelling experiments to assess changes in the rate of synthesis, abundance and decay of mRNAs under these conditions. An important observation was that in the absence of Sfp1, mRNAs encoding ribosomal protein genes not only had a reduced synthesis rate, but also an increased degradation rate. This important observation needs careful validation, as genomic run-on experiments were used to measure half-lives, and this particular method was found to give results that correlated poorly with other measures of half-life in yeast (e.g. Chappelboim et al., 2022 for a comparison). As an additional validation, a temperature shift to 42{degree sign}C was used to show that , for specific ribosomal protein mRNA, the degradation was faster, assuming that transcription stops at that temperature. It would be important to cite and discuss the work from the Tollervey laboratory showing that a temperature shift to 42{degree sign}C leads to a strong and specific decrease in ribosomal protein mRNA levels, probably through an accelerated RNA degradation (Bresson et al., Mol Cell 2020, e.g. Fig 5E). Finally, the conclusion that mRNA deadenylation rate is altered in the absence of Sfp1, is difficult to assess from the presented results (Fig. 3D).
The effects of SFP1 on transcription were investigated by chromatin purification with Rpb3, a subunit of RNA polymerase, and the results were compared with synthesis rates determined by genomic run-on experiments. The decrease in polII presence on transcripts in the absence of SFP1 was not accompanied by a marked decrease in transcript output, suggesting an effect of Sfp1 in ensuring robust transcription and avoiding RNA polymerase backtracking. To further investigate the phenotypes associated with the depletion or absence of Sfp1, the authors examined the presence of Rpb4 along transcription units compared to Rpb3. An effect of spf1 deficiency was that this ratio, which decreased from the start of transcription towards the end of transcripts, increased slightly. To what extent this result is important for the main message of the manuscript is unclear.
Suggestions: a) please clearly indicate in the figures when they correspond to reanalyses of published results. b) In table S2, it would be important to mention what the results represent and what statistics were used for the selection of "positive" hits.
Strengths:
- Diversity of experimental approaches used.<br /> - Validation of large-scale results with appropriate reporters.
Weaknesses:
- Lack of controls for the CRAC results and lack of negative controls for the co-purification experiments that were used to validate specific mRNA targets potentially bound by Sfp1.<br /> - Several conclusions are derived from complex correlative analyses that fully depend on the validity of the aforementioned Sfp1-mRNA interactions.
-
-
www.medrxiv.org www.medrxiv.org
-
eLife assessment
This useful study reports machine learning models derived from large-scale data to predict the risk of post-stroke epilepsy. The evidence supporting the conclusions is, however, incomplete, as many critical methodological aspects have been omitted or described too briefly, the analysis of the results is not complete, and the dataset and code have not been disclosed, which represents an obstacle to reproducibility. The study may be of some interest in the field of clinical neurology.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This important study substantially advances our understanding of energy landscapes and their link to animal ontogeny. The evidence supporting the conclusions is compelling, with high-throughput telemetry data and advanced track segmentation methods used to develop and map energy landscapes. The work will be of broad interest to animal ecologists.
-
-
www.researchsquare.com www.researchsquare.com
-
eLife assessment
This important study combines fMRI and electrophysiology in sedated and awake rats to show that LFPs strongly explain spatial correlations in resting-state fMRI but only weakly explain temporal variability. The authors propose that other, electrophysiology-invisible mechanisms contribute to the fMRI signal. The evidence supporting the separation of spatial and temporal correlations is convincing, and the authors consider alternative potential factors that could account for the differences in spatial and temporal correlation that were observed. This work will be of interest to researchers who study the mechanisms behind resting-state fMRI.
Tags
Annotators
URL
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This study presents a useful method for the extraction of behaviour-related activity from neural population recordings based on a specific deep learning architecture, a variational autoencoder. Although the authors performed thorough benchmarking of their method in the context of decoding behavioural variables, the evidence supporting claims about encoding is incomplete as the results may stem, in part, from the properties of the method itself.
-
Reviewer #1 (Public Review):
This work seeks to understand how behaviour-related information is represented in the neural activity of the primate motor cortex. To this end, a statistical model of neural activity is presented that enables a non-linear separation of behaviour-related from unrelated activity. As a generative model, it enables the separate analysis of these two activity modes, here primarily done by assessing the decoding performance of hand movements the monkeys perform in the experiments. Several lines of analysis are presented to show that while the neurons with significant tuning to movements strongly contribute to the behaviourally-relevant activity subspace, less or un-tuned neurons also carry decodable information. It is further shown that the discovered subspaces enable linear decoding, leading the authors to conclude that motor cortex read-out can be linear.
Strengths:
In my opinion, using an expressive generative model to analyse neural state spaces is an interesting approach to understand neural population coding. While potentially sacrificing interpretability, this approach allows capturing both redundancies and synergies in the code as done in this paper. The model presented here is a natural non-linear extension of a previous linear model PSID) and uses weak supervision in a manner similar to a previous non-linear model (TNDM).
Weaknesses:
This revised version provides additional evidence to support the author's claims regarding model performance and interpretation of the structure of the resulting latent spaces, in particular the distributed neural code over the whole recorded population, not just the well-tuned neurons. The improved ability to linearly decode behaviour from the relevant subspace and the analysis of the linear subspace projections in my opinion convincingly demonstrates that the model picks up behaviour-relevant dynamics, and that these are distributed widely across the population. As reviewer 3 also points out, I would, however, caution to interpret this as evidence for linear read-out of the motor system - your model performs a non-linear transformation, and while this is indeed linearly decodable, the motor system would need to do something similar first to achieve the same. In fact to me it seems to show the opposite, that behaviour-related information may not be generally accessible to linear decoders (including to down-stream brain areas).
As in my initial review, I would also caution against making strong claims about identifiability although this work and TNDM seem to show that in practise such methods work quite well. CEBRA, in contrast, offers some theoretical guarantees, but it is not a generative model, so would not allow the type of analysis done in this paper. In your model there is a para,eter \alpha to balance between neural and behaviour reconstruction. This seems very similar to TNDM and has to be optimised - if this is correct, then there is manual intervention required to identify a good model.
Somewhat related, I also found that the now comprehensive comparison with related models shows that the using decoding performance (R2) as a metric for model comparison may be problematic: the R2 values reported in Figure 2 (e.g. the MC_RTT dataset) should be compared to the values reported in the neural latent benchmark, which represent well-tuned models (e.g. AutoLFADS). The numbers (difficult to see, a table with numbers in the appendix would be useful, see: https://eval.ai/web/challenges/challenge-page/1256/leaderboard) seem lower than what can be obtained with models without latent space disentanglement. While this does not necessarily invalidate the conclusions drawn here, it shows that decoding performance can depend on a variety of model choices, and may not be ideal to discriminate between models. I'm also surprised by the low neural R2 for LFADS I assume this is condition-averaged) - LFADS tends to perform very well on this metric.
One statement I still cannot follow is how the prior of the variational distribution is modelled. You say you depart from the usual Gaussian prior, but equation 7 seems to suggest there is a normal prior. Are the parameters of this distribution learned? As I pointed out earlier, I however suspect this may not matter much as you give the prior a very low weight. I also still am not sure how you generate a sample from the variational distribution, do you just draw one for each pass?
Summary:
This paper presents a very interesting analysis, but some concerns remain that mainly stem from the complexity of deep learning models. It would be good to acknowledge these as readers without relevant background need to understand where the possible caveats are.
-
Reviewer #2 (Public Review):
Li et al present a method to extract "behaviorally relevant" signals from neural activity. The method is meant to solve a problem which likely has high utility for neuroscience researchers. There are numerous existing methods to achieve this goal some of which the authors compare their method to-thankfully, the revised version includes one of the major previous omissions (TNDM). However, I still believe that d-VAE is a promising approach that has its own advantages. Still, I have issues with the paper as-is. The authors have made relatively few modifications to the text based on my previous comments, and the responses have largely just dismissed my feedback and restated claims from the paper. Nearly all of my previous comments remain relevant for this revised manuscript. As such, they have done little to assuage my concerns, the most important of which I will restate here using the labels/notation (Q1, Q2, etc) from the reviewer response.
(Q1) I still remain unconvinced that the core findings of the paper are "unexpected". In the response to my previous Specific Comment #1, they say "We use the term 'unexpected' due to the disparity between our findings and the prior understanding concerning neural encoding and decoding." However, they provide no citations or grounding for why they make those claims. What prior understanding makes it unexpected that encoding is more complex than decoding given the entropy, sparseness, and high dimensionality of neural signals (the "encoding") compared to the smoothness and low dimensionality of typical behavioural signals (the "decoding")?
(Q2) I still take issue with the premise that signals in the brain are "irrelevant" simply because they do not correlate with a fixed temporal lag with a particular behavioural feature hand-chosen by the experimenter. In the response to my previous review, the authors say "we employ terms like 'behaviorally-relevant' and 'behaviorally-irrelevant' only regarding behavioral variables of interest measured within a given task, such as arm kinematics during a motor control task.". This is just a restatement of their definition, not a response to my concern, and does not address my concern that the method requires a fixed temporal lag and continual decoding/encoding. My example of reward signals remains. There is a huge body of literature dating back to the 70s on the linear relationships between neural and activity and arm kinematics; in a sense, the authors have chosen the "variable of interest" that proves their point. This all ties back to the previous comment: this is mostly expected, not unexpected, when relating apparently-stochastic, discrete action potential events to smoothly varying limb kinematics.
(Q5) The authors seem to have missed the spirit of my critique: to say "linear readout is performed in motor cortex" is an over-interpretation of what their model can show.
(Q7) Agreeing with my critique is not sufficient; please provide the data or simulations that provides the context for the reference in the fano factor. I believe my critique is still valid.
(Q8) Thank you for comparing to TNDM, it's a useful benchmark.
-
Reviewer #4 (Public Review):
I am a new reviewer for this manuscript, which has been reviewed before. The authors provide a variational autoencoder that has three objectives in the loss: linear reconstruction of behavior from embeddings, reconstruction of neural data, and KL divergence term related to the variational model elements. They take the output of the VAE as the "behaviorally relevant" part of neural data and call the residual "behaviorally irrelevant". Results aim to inspect the linear versus nonlinear behavior decoding using the original raw neural data versus the inferred behaviorally relevant and irrelevant parts of the signal.
Overall, studying neural computations that are behaviorally relevant or not is an important problem, which several previous studies have explored (for example PSID in (Sani et al. 2021), TNDM in (Hurwitz et al. 2021), TAME-GP in (Balzani et al. 2023), pi-VAE in (Zhou and Wei 2020), and dPCA in (Kobak et al. 2016), etc). However, this manuscript does not properly put their work in the context of such prior works. For example, the abstract states "One solution is to accurately separate behaviorally-relevant and irrelevant signals, but this approach remains elusive", which is not the case given that these prior works have done that. The same is true for various claims in the main text, for example "Furthermore, we found that the dimensionality of primary subspace of raw signals (26, 64, and 45 for datasets A, B, and C) is significantly higher than that of behaviorally-relevant signals (7, 13, and 9), indicating that using raw signals to estimate the neural dimensionality of behaviors leads to an overestimation" (line 321). This finding was presented in (Sani et al. 2021) and (Hurwitz et al. 2021), which is not clarified here. This issue of putting the work in context has been brought up by other reviewers previously but seems to remain largely unaddressed. The introduction is inaccurate also in that it mixes up methods that were designed for separation of behaviorally relevant information with those that are unsupervised and do not aim to do so (e.g., LFADS). The introduction should be significantly revised to explicitly discuss prior models/works that specifically formulated this behavior separation and what these prior studies found, and how this study differs.
Beyond the above, some of the main claims/conclusions made by the manuscript are not properly supported by the analyses and results, which has also been brought up by other reviewers but not fully addressed. First, the analyses here do not support the linear readout from the motor cortex because i) by construction, the VAE here is trained to have a linear readout from its embedding in its loss, which can bias its outputs toward doing well with a linear decoder/readout, and ii) the overall mapping from neural data to behavior includes both the VAE and the linear readout and thus is always nonlinear (even when a linear Kalman filter is used for decoding). This claim is also vague as there is no definition of readout from "motor cortex" or what it means. Why is the readout from the bottleneck of this particular VAE the readout of motor cortex? Second, other claims about properties of individual neurons are also confounded because the VAE is a population-level model that extracts the bottleneck from all neurons. Thus, information can leak from any set of neurons to other sets of neurons during the inference of behaviorally relevant parts of signals. Overall, the results do not convincingly support the claims, and thus the claims should be carefully revised and significantly tempered to avoid misinterpretation by readers.
Below I briefly expand on these as well as other issues, and provide suggestions:
(1) Claims about linearity of "motor cortex" readout are not supported by results yet stated even in the abstract. Instead, what the results support is that for decoding behavior from the output of the dVAE model -- that is trained specifically to have a linear behavior readout from its embedding -- a nonlinear readout does not help. This result can be biased by the very construction of the dVAE's loss that encourages a linear readout/decoding from embeddings and thus does not imply a finding about motor cortex.
(2) Related to the above, it is unclear what the manuscript means by readout from motor cortex. A clearer definition of "readout" (a mapping from what to what?) in general is needed. The mapping that the linearity/nonlinearity claims refer to is from the *inferred* behaviorally relevant neural signals, which themselves are inferred nonlinearly using the VAE. This should be explicitly clarified in all claims, i.e., that only the mapping from distilled signals to behavior is linear, not the whole mapping from neural data to behavior. Again, to say the readout from motor cortex is linear is not supported, including in the abstract.
(3) Claims about individual neurons are also confounded. The d-VAE distilling processing is a population level embedding so the individual distilled neurons are not obtainable on their own without using the population data. This population level approach also raises the possibility that information can leak from one neuron to another during distillation, which is indeed what the authors hope would recover true information about individual neurons that wasn't there in the recording (the pixel denoising example). The authors acknowledge the possibility that information could leak to a neuron that didn't truly have that information and try to rule it out to some extent with some simulations and by comparing the distilled behaviorally relevant signals to the original neural signals. But ultimately, the distilled signals are different enough from the original signals to substantially improve decoding of low information neurons, and one cannot be sure if all of the information in distilled signals from any individual neuron truly belongs to that neuron. It is still quite likely that some of the improved behavior prediction of the distilled version of low-information neurons is due to leakage of behaviorally relevant information from other neurons, not the former's inherent behavioral information. This should be explicitly acknowledged in the manuscript.
(4) Given the nuances involved in appropriate comparisons across methods and since two of the datasets are public, the authors should provide their complete code (not just the dVAE method code), including the code for data loading, data preprocessing, model fitting and model evaluation for all methods and public datasets. This will alleviate concerns and allow readers to confirm conclusions (e.g., figure 2) for themselves down the line.
(5) Related to 1) above, the authors should explore the results if the affine network h(.) (from embedding to behavior) was replaced with a nonlinear ANN. Perhaps linear decoders would no longer be as close to nonlinear decoders. Regardless, the claim of linearity should be revised as described in 1) and 2) above, and all caveats should be discussed.
(6) The beginning of the section on the "smaller R2 neurons" should clearly define what R2 is being discussed. Based on the response to previous reviewers, this R2 "signifies the proportion of neuronal activity variance explained by the linear encoding model, calculated using raw signals". This should be mentioned and made clear in the main text whenever this R2 is referred to.
(7) Various terms require clear definitions. The authors sometimes use vague terminology (e.g., "useless") without a clear definition. Similarly, discussions regarding dimensionality could benefit from more precise definitions. How is neural dimensionality defined? For example, how is "neural dimensionality of specific behaviors" (line 590) defined? Related to this, I agree with Reviewer 2 that a clear definition of irrelevant should be mentioned that clarifies that relevance is roughly taken as "correlated or predictive with a fixed time lag". The analyses do not explore relevance with arbitrary time lags between neural and behavior data.
(8) CEBRA itself doesn't provide a neural reconstruction from its embeddings, but one could obtain one via a regression from extracted CEBRA embeddings to neural data. In addition to decoding results of CEBRA (figure S3), the neural reconstruction of CEBRA should be computed and CEBRA should be added to Figure 2 to see how the behaviorally relevant and irrelevant signals from CEBRA compare to other methods.
References:
Kobak, Dmitry, Wieland Brendel, Christos Constantinidis, Claudia E Feierstein, Adam Kepecs, Zachary F Mainen, Xue-Lian Qi, Ranulfo Romo, Naoshige Uchida, and Christian K Machens. 2016. "Demixed Principal Component Analysis of Neural Population Data." Edited by Mark CW van Rossum. eLife 5 (April): e10989. https://doi.org/10.7554/eLife.10989.
Sani, Omid G., Hamidreza Abbaspourazad, Yan T. Wong, Bijan Pesaran, and Maryam M. Shanechi. 2021. "Modeling Behaviorally Relevant Neural Dynamics Enabled by Preferential Subspace Identification." Nature Neuroscience 24 (1): 140-49. https://doi.org/10.1038/s41593-020-00733-0.
Zhou, Ding, and Xue-Xin Wei. 2020. "Learning Identifiable and Interpretable Latent Models of High-Dimensional Neural Activity Using Pi-VAE." In Advances in Neural Information Processing Systems, 33:7234-47. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2020/hash/510f2318f324cf07fce24c3a4b89c771-Abstract.html.
Hurwitz, Cole, Akash Srivastava, Kai Xu, Justin Jude, Matthew Perich, Lee Miller, and Matthias Hennig. 2021. "Targeted Neural Dynamical Modeling." In Advances in Neural Information Processing Systems. Vol. 34. https://proceedings.neurips.cc/paper/2021/hash/f5cfbc876972bd0d031c8abc37344c28-Abstract.html.
Balzani, Edoardo, Jean-Paul G. Noel, Pedro Herrero-Vidal, Dora E. Angelaki, and Cristina Savin. 2023. "A Probabilistic Framework for Task-Aligned Intra- and Inter-Area Neural Manifold Estimation." In . https://openreview.net/forum?id=kt-dcBQcSA.
-
Author response:
The following is the authors’ response to the previous reviews.
To the Senior Editor and the Reviewing Editor:
We sincerely appreciate the valuable comments provided by the reviewers, the reviewing editor, and the senior editor. Based on our last response and revision, we are confused by the two limitations noted in the eLife assessment.
(1) benchmarking against comparable methods is limited.
In our last revision, we added the comparison experiments with TNDM, as the reviewers requested. Additionally, it is crucial to emphasize that our evaluation of decoding capabilities of behaviorally relevant signals has been benchmarked against the performance of the ANN on raw signals, which, as Reviewer #1 previously noted, nearly represents the upper limit of performance. Consequently, we believe that our benchmarking methods are sufficiently strong.
(2) some observations may be a byproduct of their method, and may not constitute new scientific observations.
We believe that our experimental results are sufficient to demonstrate that our conclusions are not byproducts of d-VAE based on three reasons:
(1) The d-VAE, as a latent variable model, adheres to the population doctrine, which posits that latent variables are responsible for generating the activities of individual neurons. The goal of such models is to maximize the explanation of the raw signals. At the signal level, the only criterion we can rely on is neural reconstruction performance, in which we have achieved unparalleled results. Thus, it is inappropriate to focus on the mixing process during the model's inference stage while overlooking the crucial de-mixing process during the generation stage and dismissing the significance of our neural reconstruction results. For more details, please refer to the first point in our response to Q4 from Reviewer #4.
(2) The criterion that irrelevant signals should contain minimal information can effectively demonstrate that our conclusions are not by-products of d-VAE. Unfortunately, the reviewers seem to have overlooked this criterion. For more details, please refer to the third point in our response to Q4 from Reviewer #4
(3) Our synthetic experimental results also substantiate that our conclusions are not byproducts of d-VAE. However, it appears the reviewers did not give these results adequate consideration. For more details, please refer to the fourth point in our response to Q4 from Reviewer #4.
Furthermore, our work presents not just "a useful method" but a comprehensive framework. Our study proposes, for the first time, a framework for defining, extracting, and validating behaviorally relevant signals. In our current revision, to clearly distinguish between d-VAE and other methods, we have formalized the extraction of behaviorally relevant signals into a mathematical optimization problem. To our knowledge, current methods have not explicitly proposed extracting behaviorally relevant signals, nor have they identified and addressed the key challenges of extracting relevant signals. Similarly, existing research has not yet defined and validated behaviorally relevant signals. For more details, please refer to our response to Q1 from Reviewer #4.
Based on these considerations, we respectfully request that you reconsider the eLife assessment of our work. We greatly appreciate your time and attention to this matter.
The main revisions made to the manuscript are as follows:
(1) We have formalized the extraction of behaviorally relevant signals into a mathematical optimization problem, enabling a clearer distinction between d-VAE and other models.
(2) We have moderated the assertion about linear readout to highlight its conjectural nature and have broadened the discussion regarding this conclusion.
(3) We have elaborated on the model details of d-VAE and have removed the identifiability claim.
To Reviewer #1
Q1: “As reviewer 3 also points out, I would, however, caution to interpret this as evidence for linear read-out of the motor system - your model performs a non-linear transformation, and while this is indeed linearly decodable, the motor system would need to do something similar first to achieve the same. In fact to me it seems to show the opposite, that behaviour-related information may not be generally accessible to linear decoders (including to down-stream brain areas).”
Thank you for your comments. It's important to note that the conclusions we draw are speculative and not definitive. We use terms like "suggest" to reflect this uncertainty. To further emphasize the conjectural nature of our conclusions, we have deliberately moderated our tone.
The question of whether behaviorally-relevant signals can be accessed by linear decoders or downstream brain regions hinges on the debate over whether the brain employs a strategy of filtering before decoding. If the brain employs such a strategy, the brain can probably access these signals. In our opinion, it is likely that the brain utilizes this strategy.
Given the existence of behaviorally relevant signals, it is reasonable to assume that the brain has intrinsic mechanisms to differentiate between relevant and irrelevant signals. There is growing evidence suggesting that the brain utilizes various mechanisms, such as attention and specialized filtering, to suppress irrelevant signals and enhance relevant signals [1-3]. Therefore, it is plausible that the brain filters before decoding, thereby effectively accessing behaviorally relevant signals.
Thank you for your valuable feedback.
(1) Sreenivasan, Sameet, and Ila Fiete. "Grid cells generate an analog error-correcting code for singularly precise neural computation." Nature neuroscience 14.10 (2011): 1330-1337.
(2) Schneider, David M., Janani Sundararajan, and Richard Mooney. "A cortical filter that learns to suppress the acoustic consequences of movement." Nature 561.7723 (2018): 391-395.
(3) Nakajima, Miho, L. Ian Schmitt, and Michael M. Halassa. "Prefrontal cortex regulates sensory filtering through a basal ganglia-to-thalamus pathway." Neuron 103.3 (2019): 445-458.
Q2: “As in my initial review, I would also caution against making strong claims about identifiability although this work and TNDM seem to show that in practise such methods work quite well. CEBRA, in contrast, offers some theoretical guarantees, but it is not a generative model, so would not allow the type of analysis done in this paper. In your model there is a para,eter \alpha to balance between neural and behaviour reconstruction. This seems very similar to TNDM and has to be optimised - if this is correct, then there is manual intervention required to identify a good model.”
Thank you for your comments.
Considering your concerns about our identifiability claims and the fact that identifiability is not directly relevant to the core of our paper, we have removed content related to identifiability.
Firstly, our model is based on the pi-VAE, which also has theoretical guarantees. However, it is important to note that all such theoretical guarantees (including pi-VAE and CEBRA) are based on certain assumptions that cannot be validated as the true distribution of latent variables remains unknown.
Secondly, it is important to clarify that the identifiability of latent variables does not impact the conclusions of this paper, nor does this paper make specific conclusions about the model's latent variables. Identifiability means that distinct latent variables correspond to distinct observations. If multiple latent variables can generate the same observation, it becomes impossible to determine which one is correct given the observation, which leads to the issue of nonidentifiability. Notably, our analysis focuses on the generated signals, not the latent variables themselves, and thus the identifiability of these variables does not affect our findings.
Our approach, dedicated to extracting these signals, distinctly differs from methods such as TNDM, which focuses on extracting behaviorally relevant latent dynamics. To clearly set apart d-VAE from other models, we have framed the extraction of behaviorally relevant signals as the following mathematical optimization problem:
where 𝑥# denotes generated behaviorally-relevant signals, 𝑥 denotes raw noisy signals, 𝐸(⋅,⋅) demotes reconstruction loss, and 𝑅(⋅) denotes regularization loss. It is important to note that while both d-VAE and TNDM employ reconstruction loss, relying solely on this term is insufficient for determining the optimal degree of similarity between the generated and raw noisy signals. The key to accurately extracting behaviorally relevant signals lies in leveraging prior knowledge about these signals to determine the optimal similarity degree, encapsulated by 𝑅(𝒙𝒓). Other studies have not explicitly proposed extracting behaviorally-relevant signals, nor have they identified and addressed the key challenges involved in extracting relevant signals. Consequently, our approach is distinct from other methods.
Thank you for your valuable feedback.
Q3: “Somewhat related, I also found that the now comprehensive comparison with related models shows that the using decoding performance (R2) as a metric for model comparison may be problematic: the R2 values reported in Figure 2 (e.g. the MC_RTT dataset) should be compared to the values reported in the neural latent benchmark, which represent well-tuned models (e.g. AutoLFADS). The numbers (difficult to see, a table with numbers in the appendix would be useful, see: https://eval.ai/web/challenges/challenge-page/1256/leaderboard) seem lower than what can be obtained with models without latent space disentanglement. While this does not necessarily invalidate the conclusions drawn here, it shows that decoding performance can depend on a variety of model choices, and may not be ideal to discriminate between models. I'm also surprised by the low neural R2 for LFADS I assume this is condition-averaged) - LFADS tends to perform very well on this metric.”
Thank you for your comments. The dataset we utilized is not from the same day as the neural latent benchmark dataset. Notably, there is considerable variation in the length of trials within the RTT paradigm, and the dataset lacks explicit trial information, rendering trial-averaging unsuitable. Furthermore, behaviorally relevant signals are not static averages devoid of variability; even behavioral data exhibits variability. We computed the neural R2 using individual trials rather than condition-averaged responses.
Thank you for your valuable feedback.
Q4: “One statement I still cannot follow is how the prior of the variational distribution is modelled. You say you depart from the usual Gaussian prior, but equation 7 seems to suggest there is a normal prior. Are the parameters of this distribution learned? As I pointed out earlier, I however suspect this may not matter much as you give the prior a very low weight. I also still am not sure how you generate a sample from the variational distribution, do you just draw one for each pass?”
Thank you for your questions.
The conditional distribution of prior latent variables 𝑝%(𝒛|𝒚) is a Gaussian distribution, but the distribution of prior latent variables 𝑝(𝒛) is a mixture Gaussian distribution. The distribution of prior latent variables 𝑝(𝒛) is:
where denotes the empirical distribution of behavioral variables
𝒚, and 𝑁 denotes the number of samples, 𝒚(𝒊) denotes the 𝒊th sample, δ(⋅) denotes the Dirac delta function, and 𝑝%(𝒛|𝒚) denotes the conditional distribution of prior latent variables given the behavioral variables parameterized by network 𝑚. Based on the above equation, we can see that 𝑝(𝒛) is not a Gaussian distribution, it is a Gaussian mixture model with 𝑁 components, which is theoretically a universal approximator of continuous probability densities.
Learning this prior is important, as illustrated by our latent variable visualizations, which are not a Gaussian distribution. Upon conducting hypothesis testing for both latent variables and behavioral variables, neither conforms to Gaussian distribution (Lilliefors test and Kolmogorov-Smirnov test). Consequently, imposing a constraint on the latent variables towards N(0,1) is expected to affect performance adversely.
Regarding sampling, during training process, we draw only one sample from the approximate posterior distribution . It is worth noting that drawing multiple samples or one sample for each pass does not affect the experimental results. After training, we can generate a sample from the prior by providing input behavioral data 𝒚(𝒊) and then generating corresponding samples via and . To extract behaviorally-relevant signals from raw signals, we use and .
Thank you for your valuable feedback.
Q5: “(1) I found the figures good and useful, but the text is, in places, not easy to follow. I think the manuscript could be shortened somewhat, and in some places more concise focussed explanations would improve readability.
(2) I would not call the encoding "complex non-linear" - non-linear is a clear term, but complex can mean many things (e.g. is a quadratic function complex?) ”
Thank you for your recommendation. We have revised the manuscript for enhanced clarity. We call the encoding “complex nonlinear” because neurons encode information with varying degrees of nonlinearity, as illustrated in Fig. 3b, f, and Fig. S3b.
Thank you for your valuable feedback.
To Reviewer #2
Q1: “I still remain unconvinced that the core findings of the paper are "unexpected". In the response to my previous Specific Comment #1, they say "We use the term 'unexpected' due to the disparity between our findings and the prior understanding concerning neural encoding and decoding." However, they provide no citations or grounding for why they make those claims. What prior understanding makes it unexpected that encoding is more complex than decoding given the entropy, sparseness, and high dimensionality of neural signals (the "encoding") compared to the smoothness and low dimensionality of typical behavioural signals (the "decoding")?”
Thank you for your comments. We believe that both the complexity of neural encoding and the simplicity of neural decoding in motor cortex are unexpected.
The Complexity of Neural Encoding: As noted in the Introduction, neurons with small R2 values were traditionally considered noise and consequently disregarded, as detailed in references [1-3]. However, after filtering out irrelevant signals, we discovered that these neurons actually contain substantial amounts of behavioral information, previously unrecognized. Similarly, in population-level analyses, neural signals composed of small principal components (PCs) are often dismissed as noise, with analyses typically utilizing only between 6 and 18 PCs [4-10]. Yet, the discarded PC signals nonlinearly encode significant amounts of information, with practically useful dimensions found to range between 30 and 40—far exceeding the usual number analyzed. These findings underscore the complexity of neural encoding and are unexpected.
The Simplicity of Neural Decoding: In the motor cortex, nonlinear decoding of raw signals has been shown to significantly outperform linear decoding, as evidenced in references [11,12]. Interestingly, after separating behaviorally relevant and irrelevant signals, we observed that the linear decoding performance of behaviorally relevant signals is nearly equivalent to that of nonlinear decoding—a phenomenon previously undocumented in the motor cortex. This discovery is also unexpected.
Thank you for your valuable feedback.
(1) Georgopoulos, Apostolos P., Andrew B. Schwartz, and Ronald E. Kettner. "Neuronal population coding of movement direction." Science 233.4771 (1986): 1416-1419.
(2) Hochberg, Leigh R., et al. "Reach and grasp by people with tetraplegia using a neurally controlled robotic arm." Nature 485.7398 (2012): 372-375.
(3) Inoue, Yoh, et al. "Decoding arm speed during reaching." Nature communications 9.1 (2018): 5243.
(4) Churchland, Mark M., et al. "Neural population dynamics during reaching." Nature 487.7405 (2012): 51-56.
(5) Kaufman, Matthew T., et al. "Cortical activity in the null space: permitting preparation without movement." Nature neuroscience 17.3 (2014): 440-448.
(6) Elsayed, Gamaleldin F., et al. "Reorganization between preparatory and movement population responses in motor cortex." Nature communications 7.1 (2016): 13239.
(7) Sadtler, Patrick T., et al. "Neural constraints on learning." Nature 512.7515 (2014): 423426.
(8) Golub, Matthew D., et al. "Learning by neural reassociation." Nature neuroscience 21.4 (2018): 607-616.
(9) Gallego, Juan A., et al. "Cortical population activity within a preserved neural manifold underlies multiple motor behaviors." Nature communications 9.1 (2018): 4233.
(10) Gallego, Juan A., et al. "Long-term stability of cortical population dynamics underlying consistent behavior." Nature neuroscience 23.2 (2020): 260-270.
(11) Glaser, Joshua I., et al. "Machine learning for neural decoding." Eneuro 7.4 (2020).
(12) Willsey, Matthew S., et al. "Real-time brain-machine interface in non-human primates achieves high-velocity prosthetic finger movements using a shallow feedforward neural network decoder." Nature Communications 13.1 (2022): 6899.
Q2: “I still take issue with the premise that signals in the brain are "irrelevant" simply because they do not correlate with a fixed temporal lag with a particular behavioural feature handchosen by the experimenter. In the response to my previous review, the authors say "we employ terms like 'behaviorally-relevant' and 'behaviorally-irrelevant' only regarding behavioral variables of interest measured within a given task, such as arm kinematics during a motor control task.". This is just a restatement of their definition, not a response to my concern, and does not address my concern that the method requires a fixed temporal lag and continual decoding/encoding. My example of reward signals remains. There is a huge body of literature dating back to the 70s on the linear relationships between neural and activity and arm kinematics; in a sense, the authors have chosen the "variable of interest" that proves their point. This all ties back to the previous comment: this is mostly expected, not unexpected, when relating apparently-stochastic, discrete action potential events to smoothly varying limb kinematics.”
Thank you for your comments.
Regarding the experimenter's specification of behavioral variables of interest, we followed common practice in existing studies [1, 2]. Regarding the use of fixed temporal lags, we followed the same practice as papers related to the dataset we use, which assume fixed temporal lags [3-5]. Furthermore, many studies in the motor cortex similarly use fixed temporal lags [68].
Concerning the issue of rewards, in the paper you mentioned [9], the impact of rewards occurs after the reaching phase. It's important to note that in our experiments, we analyze only the reaching phase, without any post-movement phase.
If the impact of rewards can be stably reflected in the signals in the reaching phase of the subsequent trial, and if the reward-induced signals do not interfere with decoding—since these signals are harmless for decoding and beneficial for reconstruction—our model is likely to capture these signals. If the signals induced by rewards during the reaching phase are randomly unstable, our model will likely be unable to capture them.
If the goal is to extract post-movement neural activity from both rewarded and unrewarded trials, and if the neural patterns differ between these conditions, one could replace the d-VAE's regression loss, used for continuous kinematics decoding, with a classification loss tailored to distinguish between rewarded and unrewarded conditions.
To clarify the definition, we have revised it in the manuscript. Specifically, before a specific definition, we briefly introduce the relevant signals and irrelevant signals. Behaviorally irrelevant signals refer to those not directly associated with the behavioral variables of interest and may include noise or signals from variables of no interest. In contrast, behaviorally relevant signals refer to those directly related to the behavioral variables of interest. For instance, rewards in the post-movement phase are not directly related to behavioral variables (kinematics) in the reaching movement phase.
It is important to note that our definition of behaviorally relevant signals not only includes decoding capabilities but also specific requirement at the signal level, based on two key requirements:
(1) they should closely resemble raw signals to preserve the underlying neuronal properties without becoming so similar that they include irrelevant signals. (encoding requirement), and (2) they should contain behavioral information as much as possible (decoding requirement). Signals that meet both requirements are considered effective behaviorally relevant signals. In our study, we assume raw signals are additively composed of behaviorally-relevant and irrelevant signals. We define irrelevant signals as those remaining after subtracting relevant signals from raw signals. Therefore, we believe our definition is clearly articulated.
Thank you for your valuable feedback.
(1) Sani, Omid G., et al. "Modeling behaviorally relevant neural dynamics enabled by preferential subspace identification." Nature Neuroscience 24.1 (2021): 140-149.
(2) Buetfering, Christina, et al. "Behaviorally relevant decision coding in primary somatosensory cortex neurons." Nature neuroscience 25.9 (2022): 1225-1236.
(3) Wang, Fang, et al. "Quantized attention-gated kernel reinforcement learning for brain– machine interface decoding." IEEE transactions on neural networks and learning systems 28.4 (2015): 873-886.
(4) Dyer, Eva L., et al. "A cryptography-based approach for movement decoding." Nature biomedical engineering 1.12 (2017): 967-976.
(5) Ahmadi, Nur, Timothy G. Constandinou, and Christos-Savvas Bouganis. "Robust and accurate decoding of hand kinematics from entire spiking activity using deep learning." Journal of Neural Engineering 18.2 (2021): 026011.
(6) Churchland, Mark M., et al. "Neural population dynamics during reaching." Nature 487.7405 (2012): 51-56.
(7) Kaufman, Matthew T., et al. "Cortical activity in the null space: permitting preparation without movement." Nature neuroscience 17.3 (2014): 440-448.
(8) Elsayed, Gamaleldin F., et al. "Reorganization between preparatory and movement population responses in motor cortex." Nature communications 7.1 (2016): 13239.
(9) Ramkumar, Pavan, et al. "Premotor and motor cortices encode reward." PloS one 11.8 (2016): e0160851.
Q3: “The authors seem to have missed the spirit of my critique: to say "linear readout is performed in motor cortex" is an over-interpretation of what their model can show.”
Thank you for your comments. It's important to note that the conclusions we draw are speculative and not definitive. We use terms like "suggest" to reflect this uncertainty. To further emphasize the conjectural nature of our conclusions, we have deliberately moderated our tone.
The question of whether behaviorally-relevant signals can be accessed by downstream brain regions hinges on the debate over whether the brain employs a strategy of filtering before decoding. If the brain employs such a strategy, the brain can probably access these signals. In our view, it is likely that the brain utilizes this strategy.
Given the existence of behaviorally relevant signals, it is reasonable to assume that the brain has intrinsic mechanisms to differentiate between relevant and irrelevant signals. There is growing evidence suggesting that the brain utilizes various mechanisms, such as attention and specialized filtering, to suppress irrelevant signals and enhance relevant signals [1-3]. Therefore, it is plausible that the brain filters before decoding, thereby effectively accessing behaviorally relevant signals.
Regarding the question of whether the brain employs linear readout, given the limitations of current observational methods and our incomplete understanding of brain mechanisms, it is challenging to ascertain whether the brain employs a linear readout. In many cortical areas, linear decoders have proven to be sufficiently accurate. Consequently, numerous studies [4, 5, 6], including the one you referenced [4], directly employ linear decoders to extract information and formulate conclusions based on the decoding results. Contrary to these approaches, our research has compared the performance of linear and nonlinear decoders on behaviorally relevant signals and found their decoding performance is comparable. Considering both the decoding accuracy and model complexity, our results suggest that the motor cortex may utilize linear readout to decode information from relevant signals. Given the current technological limitations, we consider it reasonable to analyze collected data to speculate on the potential workings of the brain, an approach that many studies have also embraced [7-10]. For instance, a study [7] deduces strategies the brain might employ to overcome noise by analyzing the structure of recorded data and decoding outcomes for new stimuli.
Thank you for your valuable feedback.
(1) Sreenivasan, Sameet, and Ila Fiete. "Grid cells generate an analog error-correcting code for singularly precise neural computation." Nature neuroscience 14.10 (2011): 1330-1337.
(2) Schneider, David M., Janani Sundararajan, and Richard Mooney. "A cortical filter that learns to suppress the acoustic consequences of movement." Nature 561.7723 (2018): 391-395.
(3) Nakajima, Miho, L. Ian Schmitt, and Michael M. Halassa. "Prefrontal cortex regulates sensory filtering through a basal ganglia-to-thalamus pathway." Neuron 103.3 (2019): 445-458.
(4) Jurewicz, Katarzyna, et al. "Irrational choices via a curvilinear representational geometry for value." bioRxiv (2022): 2022-03.
(5) Hong, Ha, et al. "Explicit information for category-orthogonal object properties increases along the ventral stream." Nature neuroscience 19.4 (2016): 613-622.
(6) Chang, Le, and Doris Y. Tsao. "The code for facial identity in the primate brain." Cell 169.6 (2017): 1013-1028.
(7) Ganmor, Elad, Ronen Segev, and Elad Schneidman. "A thesaurus for a neural population code." Elife 4 (2015): e06134.
(8) Churchland, Mark M., et al. "Neural population dynamics during reaching." Nature 487.7405 (2012): 51-56.
(9) Gallego, Juan A., et al. "Cortical population activity within a preserved neural manifold underlies multiple motor behaviors." Nature communications 9.1 (2018): 4233.
(10) Gallego, Juan A., et al. "Long-term stability of cortical population dynamics underlying consistent behavior." Nature neuroscience 23.2 (2020): 260-270.
Q4: “Agreeing with my critique is not sufficient; please provide the data or simulations that provides the context for the reference in the fano factor. I believe my critique is still valid.”
Thank you for your comments. As we previously replied, Churchland's research examines the variability of neural signals across different stages, including the preparation and execution phases, as well as before and after the target appears. Our study, however, focuses exclusively on the movement execution phase. Consequently, we are unable to produce comparative displays similar to those in his research. Intuitively, one might expect that the variability of behaviorally relevant signals would be lower; however, since no prior studies have accurately extracted such signals, the specific FF values of behaviorally relevant signals remain unknown. Therefore, presenting these values is meaningful, and can provide a reference for future research. While we cannot compare FF across different stages, we can numerically compare the values to the Poisson count process. An FF of 1 indicates a Poisson firing process, and our experimental data reveals that most neurons have an FF less than 1, indicating that the variance in firing counts is below the mean. Thank you for your valuable feedback.
To Reviewer #4
Q1: “Overall, studying neural computations that are behaviorally relevant or not is an important problem, which several previous studies have explored (for example PSID in (Sani et al. 2021), TNDM in (Hurwitz et al. 2021), TAME-GP in (Balzani et al. 2023), pi-VAE in (Zhou and Wei 2020), and dPCA in (Kobak et al. 2016), etc). However, this manuscript does not properly put their work in the context of such prior works. For example, the abstract states "One solution is to accurately separate behaviorally-relevant and irrelevant signals, but this approach remains elusive", which is not the case given that these prior works have done that. The same is true for various claims in the main text, for example "Furthermore, we found that the dimensionality of primary subspace of raw signals (26, 64, and 45 for datasets A, B, and C) is significantly higher than that of behaviorally-relevant signals (7, 13, and 9), indicating that using raw signals to estimate the neural dimensionality of behaviors leads to an overestimation" (line 321). This finding was presented in (Sani et al. 2021) and (Hurwitz et al. 2021), which is not clarified here. This issue of putting the work in context has been brought up by other reviewers previously but seems to remain largely unaddressed. The introduction is inaccurate also in that it mixes up methods that were designed for separation of behaviorally relevant information with those that are unsupervised and do not aim to do so (e.g., LFADS). The introduction should be significantly revised to explicitly discuss prior models/works that specifically formulated this behavior separation and what these prior studies found, and how this study differs.”
Thank you for your comments. Our statement about “One solution is to accurately separate behaviorally-relevant and irrelevant signals, but this approach remains elusive” is accurate. To our best knowledge, there is no prior works to do this work--- separating accurate behaviorally relevant neural signals at both single-neuron and single-trial resolution. The works you mentioned have not explicitly proposed extracting behaviorally relevant signals, nor have they identified and addressed the key challenges of extracting relevant signals, namely determining the optimal degree of similarity between the generated relevant signals and raw signals. Those works focus on the latent neural dynamics, rather than signal level.
To clearly set apart d-VAE from other models, we have framed the extraction of behaviorally relevant signals as the following mathematical optimization problem:
where 𝒙𝒓 denotes generated behaviorally-relevant signals, 𝒙 denotes raw noisy signals, 𝐸(⋅,⋅) demotes reconstruction loss, and 𝑅(⋅) denotes regularization loss. It is important to note that while both d-VAE and TNDM employ reconstruction loss, relying solely on this term is insufficient for determining the optimal degree of similarity between the generated and raw noisy signals. The key to accurately extracting behaviorally relevant signals lies in leveraging prior knowledge about these signals to determine the optimal similarity degree, encapsulated by 𝑅(𝒙𝒓). All the works you mentioned did not have the key part 𝑅(𝒙𝒓).
Regarding the dimensionality estimation, the dimensionality of neural manifolds quantifies the degrees of freedom required to describe population activity without significant information loss.
There are two differences between our work and PSID and TNDM.
First, the dimensions they refer to are fundamentally different from ours. The dimensionality we describe pertains to a linear subspace, where a neural dimension or neural mode or principal component basis, , with N representing the number of neurons. However, the vector length of a neural mode of PSID and our approach differs; PSID requires concatenating multiple time steps T, essentially making , TNDM, on the other hand, involves nonlinear dimensionality reduction, which is different from linear dimensionality reduction.
Second, we estimate neural dimensionality by explaining the variance of neural signals, whereas PSID and TNDM determine dimensionality through decoding performance saturation. It is important to note that the dimensionality at which decoding performance saturates may not accurately reflect the true dimensionality of neural manifolds, as some dimensions may contain redundant information that does not enhance decoding performance.
We acknowledge that while LFADS can generate signals that contain some behavioral information, it was not specifically designed to do so. Following your suggestion, we have removed this reference from the Introduction.
Thank you for your valuable feedback.
Q2: “Claims about linearity of "motor cortex" readout are not supported by results yet stated even in the abstract. Instead, what the results support is that for decoding behavior from the output of the dVAE model -- that is trained specifically to have a linear behavior readout from its embedding -- a nonlinear readout does not help. This result can be biased by the very construction of the dVAE's loss that encourages a linear readout/decoding from embeddings, and thus does not imply a finding about motor cortex.”
Thank you for your comments. We respectfully disagree with the notion that the ability of relevant signals to be linearly decoded is due to constraints that allow embedding to be linearly decoded. Embedding involves reorganizing or transforming the structure of original signals, and they can be linearly decoded does not mean the corresponding signals can be decoded linearly.
Let's clarify this with three intuitive examples:
Example 1: Image denoising is a well-established field. Whether employing supervised or blind denoising methods [1, 2], both can effectively recover the original image. This denoising process closely resembles the extraction of behaviorally relevant signals from raw signals. Consider if noisy images are not amenable to linear decoding (classification); would removing the noise enable linear decoding? The answer is no. Typically, the noise in images captured under normal conditions is minimal, yet even the clear images remain challenging to decode linearly.
Example 2: Consider the task of face recognition, where face images are set against various backgrounds, in this context, the pixels representing the face corresponds to relevant signals, while the background pixels are considered irrelevant. Suppose a network is capable of extracting the face pixels and the resulting embedding can be linearly decoded. Can the face pixels themselves be linearly decoded? The answer is no. If linear decoding of face pixels were feasible, the challenging task of face recognition could be easily resolved by merely extracting the face from the background and training a linear classifier.
Example 3: In the MNIST dataset, the background is uniformly black, and its impact is minimal. However, linear SVM classifiers used directly on the original pixels significantly underperform compared to non-linear SVMs.
In summary, embedding involves reorganizing the structure of the original signals through a feature transformation function. However, the reconstruction process can recover the structure of the original signals from the embedding. The fact that the structure of the embedding can be linearly decoded does not imply that the structure of the original signals can be linearly decoded in the same way. It is inappropriate to focus on the compression process without equally considering the reconstruction process.
Thank you for your valuable feedback.
(1) Mao, Xiao-Jiao, Chunhua Shen, and Yu-Bin Yang. "Image restoration using convolutional auto-encoders with symmetric skip connections." arXiv preprint arXiv:1606.08921 (2016).
(2) Lehtinen, Jaakko, et al. "Noise2Noise: Learning image restoration without clean data." International Conference on Machine Learning. International Machine Learning Society, 2018.
Q3: “Related to the above, it is unclear what the manuscript means by readout from motor cortex. A clearer definition of "readout" (a mapping from what to what?) in general is needed. The mapping that the linearity/nonlinearity claims refer to is from the *inferred* behaviorally relevant neural signals, which themselves are inferred nonlinearly using the VAE. This should be explicitly clarified in all claims, i.e., that only the mapping from distilled signals to behavior is linear, not the whole mapping from neural data to behavior. Again, to say the readout from motor cortex is linear is not supported, including in the abstract.”
Thank you for your comments. We have revised the manuscript to make it more clearly. Thank you for your valuable feedback.
Q4: “Claims about individual neurons are also confounded. The d-VAE distilling processing is a population level embedding so the individual distilled neurons are not obtainable on their own without using the population data. This population level approach also raises the possibility that information can leak from one neuron to another during distillation, which is indeed what the authors hope would recover true information about individual neurons that wasn't there in the recording (the pixel denoising example). The authors acknowledge the possibility that information could leak to a neuron that didn't truly have that information and try to rule it out to some extent with some simulations and by comparing the distilled behaviorally relevant signals to the original neural signals. But ultimately, the distilled signals are different enough from the original signals to substantially improve decoding of low information neurons, and one cannot be sure if all of the information in distilled signals from any individual neuron truly belongs to that neuron. It is still quite likely that some of the improved behavior prediction of the distilled version of low-information neurons is due to leakage of behaviorally relevant information from other neurons, not the former's inherent behavioral information. This should be explicitly acknowledged in the manuscript.”
Thank you for your comments. We value your insights regarding the mixing process. However, we are confident in the robustness of our conclusions. We respectfully disagree with the notion that the small R2 values containing significant information are primarily due to leakage, and we base our disagreement on four key reasons.
(1) Neural reconstruction performance is a reliable and valid criterion.
The purpose of latent variable models is to explain neuronal activity as much as possible. Given the fact that the ground truth of behaviorally-relevant signals, the latent variables, and the generative model is unknow, it becomes evident that the only reliable reference at the signal level is the raw signals. A crucial criterion for evaluating the reliability of latent variable models (including latent variables and generated relevant signals) is their capability to effectively explain the raw signals [1]. Consequently, we firmly maintain the belief that if the generated signals closely resemble the raw signals to the greatest extent possible, in accordance with an equivalence principle, we can claim that these obtained signals faithfully retain the inherent properties of single neurons.
Reviewer #4 appears to focus on the compression (mixing) process without giving equal consideration to the reconstruction (de-mixing) process. Numerous studies have demonstrated that deep autoencoders can reconstruct the original signal very effectively. For example, in the field of image denoising, autoencoders are capable of accurately restoring the original image [2, 3]. If one persistently focuses on the fact of mixing and ignores the reconstruction (demix) process, even if the only criterion that we can rely on at the signal level is high, one still won't acknowledge it. If this were the case, many problems would become unsolvable. For instance, a fundamental criterion for latent variable models is their ability to explain the original data. If the ground truth of the latent variables remains unknown and the reconstruction criterion is disregarded, how can we validate the effectiveness of the model, the validity of the latent variables, or ensure that findings related to latent variables are not merely by-products of the model? Therefore, we disagree with the aforementioned notion. We believe that as long as the reconstruction performance is satisfactory, the extracted signals have successfully retained the characteristics of individual neurons.
In our paper, we have shown in various ways that our generated signals sufficiently resemble the raw signals, including visualizing neuronal activity (Fig. 2m, Fig. 3i, and Fig. S5), achieving the highest performance among competitors (Fig. 2d, h, l), and conducting control analyses. Therefore, we believe our results are reliable.
(1) Cunningham, J.P. and Yu, B.M., 2014. Dimensionality reduction for large-scale neural recordings. Nature neuroscience, 17(11), pp.1500-1509.
(2) Mao, Xiao-Jiao, Chunhua Shen, and Yu-Bin Yang. "Image restoration using convolutional auto-encoders with symmetric skip connections." arXiv preprint arXiv:1606.08921 (2016).
(3) Lehtinen, Jaakko, et al. "Noise2Noise: Learning image restoration without clean data." International Conference on Machine Learning. International Machine Learning Society, 2018.
(2) There is no reason for d-VAE to add signals that do not exist in the original signals.
(1) Adding signals that does not exist in the small R2 neurons would decrease the reconstruction performance. This is because if the added signals contain significant information, they will not resemble the irrelevant signals which contain no information, and thus, the generated signals will not resemble the raw signals. The model optimizes towards reducing the reconstruction loss, and this scenario deviates from the model's optimization direction. It is worth mentioning that when the model only has reconstruction loss without the interference of decoding loss, we believe that information leakage does not happen. Because the model can only be optimized in a direction that is similar to the raw signals; adding non-existent signals to the generated signals would increase the reconstruction loss, which is contrary to the objective of optimization.
(2) Information carried by these additional signals is redundant for larger R2 neurons, thus they do not introduce new information that can enhance the decoding performance of the neural population, which does not benefit the decoding loss.
Based on these two points, we believe the model would not perform such counterproductive and harmful operations.
(3) The criterion that irrelevant signals should contain minimal information can effectively rule out the leakage scenario.
The criterion that irrelevant signals should contain minimal information is very important, but it seems that reviewer #4 has continuously overlooked their significance. If the model's reconstruction is insufficient, or if additional information is added (which we do not believe will happen), the residuals would decode a large amount of information, and this criterion would exclude selecting such signals. To clarify, if we assume that x, y, and z denote the raw, relevant, and irrelevant signals of smaller R2 neurons, with x=y+z, and the extracted relevant signals become y+m, the irrelevant signals become z-m in this case. Consequently, the irrelevant signals contain a significant amount of information.
We presented the decoding R2 for irrelevant signals in real datasets under three distillation scenarios: a bias towards reconstruction (alpha=0, an extreme case where the model only has reconstruction loss without decoding loss), a balanced trade-off, and a bias towards decoding (alpha=0.9), as detailed in Table 1. If significant information from small R2 neurons leaks from large R2 neurons, the irrelevant signals should contain a large amount of information. However, our results indicate that the irrelevant signals contain only minimal information, and their performance closely resembles that of the model training solely with reconstruction loss, showing no significant differences (P > 0.05, Wilcoxon rank-sum test). When the model leans towards decoding, some useful information will be left in the residuals, and irrelevant signals will contain a substantial amount of information, as observed in Table 1, alpha=0.9. Therefore, we will not choose these signals for analysis.
In conclusion, the criterion that irrelevant signals should contain minimal information is a very effective measure to exclude undesirable signals.
Author response table 1.
Decoding R2 of irrelevant signals
(4) Synthetic experiments can effectively rule out the leakage scenario.
In the absence of ground truth data, synthetic experiments serve as an effective method for validating models and are commonly employed [1-3].
Our experimental results demonstrate that d-VAE can effectively extract neural signals that more closely resemble actual behaviorally relevant signals (Fig. S2g). If there were information leakage, it would decrease the similarity to the ground truth signals, hence we have ruled out this possibility. Moreover, in synthetic experiments with small R2 neurons (Fig. S10), results also demonstrate that our model could make these neurons more closely resemble ground truth relevant signals and recover their information.
In summary, synthetic experiments strongly demonstrate that our model can recover obscured neuronal information, rather than adding signals that do not exist.
(1) Pnevmatikakis, Eftychios A., et al. "Simultaneous denoising, deconvolution, and demixing of calcium imaging data." Neuron 89.2 (2016): 285-299.
(2) Schneider, Steffen, Jin Hwa Lee, and Mackenzie Weygandt Mathis. "Learnable latent embeddings for joint behavioural and neural analysis." Nature 617.7960 (2023): 360-368.
(3) Zhou, Ding, and Xue-Xin Wei. "Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE." Advances in Neural Information Processing Systems 33 (2020): 7234-7247.
Based on these four points, we are confident in the reliability of our results. If Reviewer #4 considers these points insufficient, we would highly appreciate it if specific concerns regarding any of these aspects could be detailed.
Thank you for your valuable feedback.
Q5: “Given the nuances involved in appropriate comparisons across methods and since two of the datasets are public, the authors should provide their complete code (not just the dVAE method code), including the code for data loading, data preprocessing, model fitting and model evaluation for all methods and public datasets. This will alleviate concerns and allow readers to confirm conclusions (e.g., figure 2) for themselves down the line.”
Thanks for your suggestion.
Our codes are now available on GitHub at https://github.com/eric0li/d-VAE. Thank you for your valuable feedback.
Q6: “Related to 1) above, the authors should explore the results if the affine network h(.) (from embedding to behavior) was replaced with a nonlinear ANN. Perhaps linear decoders would no longer be as close to nonlinear decoders. Regardless, the claim of linearity should be revised as described in 1) and 2) above, and all caveats should be discussed.”
Thank you for your suggestion. We appreciate your feasible proposal that can be empirically tested. Following your suggestion, we have replaced the decoding of the latent variable z to behavior y with a nonlinear neural network, specifically a neural network with a single hidden layer. The modified model is termed d-VAE2. We applied the d-VAE2 to the real data, and selected the optimal alpha through the validation set. As shown in Table 1, results demonstrate that the performance of KF and ANN remains comparable. Therefore, the capacity to linearly decode behaviorally relevant signals does not stem from the linear decoding of embeddings.
Author response table 2.
Decoding R2 of behaviorally relevant signals obtained by d-VAE2
Additionally, it is worth noting that this approach is uncommon and is considered somewhat inappropriate according to the Information Bottleneck theory [1]. According to the Information Bottleneck theory, information is progressively compressed in multilayer neural networks, discarding what is irrelevant to the output and retaining what is relevant. This means that as the number of layers increases, the mutual information between each layer's embedding and the model input gradually decreases, while the mutual information between each layer's embedding and the model output gradually increases. For the decoding part, if the embeddings that is not closest to the output (behaviors) is used, then these embeddings might contain behaviorally irrelevant signals. Using these embeddings to generate behaviorally relevant signals could lead to the inclusion of irrelevant signals in the behaviorally relevant signals.
To demonstrate the above statement, we conducted experiments on the synthetic data. As shown in Table 2, we present the performance (neural R2 between the generated signals and the ground truth signals) of both models at several alpha values around the optimal alpha of dVAE (alpha=0.9) selected by the validation set. The experimental results show that at the same alpha value, the performance of d-VAE2 is consistently inferior to that of d-VAE, and d-VAE2 requires a higher alpha value to achieve performance comparable to d-VAE, and the best performance of d-VAE2 is inferior to that of d-VAE.
Author response table 3.
Neural R2 between generated signals and real behaviorally relevant signals
Thank you for your valuable feedback.
(1) Shwartz-Ziv, Ravid, and Naftali Tishby. "Opening the black box of deep neural networks via information." arXiv preprint arXiv:1703.00810 (2017).
Q7: “The beginning of the section on the "smaller R2 neurons" should clearly define what R2 is being discussed. Based on the response to previous reviewers, this R2 "signifies the proportion of neuronal activity variance explained by the linear encoding model, calculated using raw signals". This should be mentioned and made clear in the main text whenever this R2 is referred to.”
Thank you for your suggestion. We have made the modifications in the main text. Thank you for your valuable feedback.
Q8: “Various terms require clear definitions. The authors sometimes use vague terminology (e.g., "useless") without a clear definition. Similarly, discussions regarding dimensionality could benefit from more precise definitions. How is neural dimensionality defined? For example, how is "neural dimensionality of specific behaviors" (line 590) defined? Related to this, I agree with Reviewer 2 that a clear definition of irrelevant should be mentioned that clarifies that relevance is roughly taken as "correlated or predictive with a fixed time lag". The analyses do not explore relevance with arbitrary time lags between neural and behavior data.”
Thanks for your suggestion. We have removed the “useless” statements and have revised the statement of “the neural dimensionality of specific behaviors” in our revised manuscripts.
Regarding the use of fixed temporal lags, we followed the same practice as papers related to the dataset we use, which assume fixed temporal lags [1-3]. Furthermore, many studies in the motor cortex similarly use fixed temporal lags [4-6]. To clarify the definition, we have revised the definition in our manuscript. For details, please refer to the response to Q2 of reviewer #2 and our revised manuscript. We believe our definition is clearly articulated.
Thank you for your valuable feedback.
(1) Wang, Fang, et al. "Quantized attention-gated kernel reinforcement learning for brain– machine interface decoding." IEEE transactions on neural networks and learning systems 28.4 (2015): 873-886.
(2) Dyer, Eva L., et al. "A cryptography-based approach for movement decoding." Nature biomedical engineering 1.12 (2017): 967-976.
(3) Ahmadi, Nur, Timothy G. Constandinou, and Christos-Savvas Bouganis. "Robust and accurate decoding of hand kinematics from entire spiking activity using deep learning." Journal of Neural Engineering 18.2 (2021): 026011.
(4) Churchland, Mark M., et al. "Neural population dynamics during reaching." Nature 487.7405 (2012): 51-56.
(5) Kaufman, Matthew T., et al. "Cortical activity in the null space: permitting preparation without movement." Nature neuroscience 17.3 (2014): 440-448.
(6) Elsayed, Gamaleldin F., et al. "Reorganization between preparatory and movement population responses in motor cortex." Nature communications 7.1 (2016): 13239.
Q9: “CEBRA itself doesn't provide a neural reconstruction from its embeddings, but one could obtain one via a regression from extracted CEBRA embeddings to neural data. In addition to decoding results of CEBRA (figure S3), the neural reconstruction of CEBRA should be computed and CEBRA should be added to Figure 2 to see how the behaviorally relevant and irrelevant signals from CEBRA compare to other methods.”
Thank you for your question. Modifying CEBRA is beyond the scope of our work. As CEBRA is not a generative model, it cannot obtain behaviorally relevant and irrelevant signals, and therefore it lacks the results presented in Fig. 2. To avoid the same confusion encountered by reviewers #3 and #4 among our readers, we have opted to exclude the comparison with CEBRA. It is crucial to note, as previously stated, that our assessment of decoding capabilities has been benchmarked against the performance of the ANN on raw signals, which almost represents the upper limit of performance. Consequently, omitting CEBRA does not affect our conclusions.
Thank you for your valuable feedback.
Q10: “Line 923: "The optimal hyperparameter is selected based on the lowest averaged loss of five-fold training data." => why is this explained specifically under CEBRA? Isn't the same criteria used for hyperparameters of other methods? If so, clarify.”
Thank you for your question. The hyperparameter selection for CEBRA follows the practice of the original CEBRA paper. The hyperparameter selection for generative models is detailed in the Section “The strategy for selecting effective behaviorally-relevant signals”. Thank you for your valuable feedback.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This study presents an important finding on the relationship between brain activity related to sustained attention and substance use in adolescence/early adulthood with a large longitudinal dataset. The evidence supporting the claims of the authors is convincing. The work will be of interest to cognitive neuroscientists, psychologists, and clinicians working on substance use or addiction.
-
Reviewer #1 (Public Review):
This study explored the relationship between sustained attention and substance use from ages 14 to 23 in a large longitudinal dataset. They found behaviour and brain connectivity associated with poorer sustained attention at age 14 predicted subsequent increase in cannabis and cigarette smoking from ages 14-23. They concluded that the brain network of sustained attention is a robust biomarker for vulnerability to substance use. The big strength of the study is a substantial sample size and validation of the generalization to an external dataset. In addition, various methods/models were used to prove the relationship between sustained attention and substance use over time.
-
Reviewer #2 (Public Review):
Weng and colleagues investigated the relationship between sustained attention and substance use in a large cohort across three longitudinal visits (ages 14, 19, and 23). They employed a stop signal task to assess sustained attention and utilized the Timeline Followback self-report questionnaire to measure substance use. They assessed the linear relationship between sustained attention-associated functional connections and substance use at an earlier visit (age 14 or 19). Subsequently, they utilized this relationship along with the functional connection profile at a later age (age 19 or 23) to predict substance use at those respective ages. The authors found that connections in association with reduced sustained attention predicted subsequent increases in substance use, a conclusion validated in an external dataset. Altogether, the authors suggest that sustained attention could serve as a robust biomarker for predicting future substance use.
This study by Weng and colleagues focused on an important topic of substance use prediction in adolescence/early adulthood.
-
Reviewer #3 (Public Review):
Summary:
Weng and colleagues investigated the association between attention-related connectivity and substance use. They conducted a study with a sizable sample of over 1,000 participants, collecting longitudinal data at ages 14, 19, and 23. Their findings indicate that behaviors and brain connectivity linked to sustained attention at age 14 forecasted subsequent increases in cigarette and cannabis use from ages 14 to 23. However, early substance use did not predict future attention levels or attention-related connectivity strength.
Strengths:
The study's primary strength lies in its large sample size and longitudinal design spanning three time-points. A robust predictive analysis was employed, demonstrating that diminished sustained attention behavior and connectivity strength predict substance use, while early substance use does not forecast future attention-related behavior or connectivity strength.
Weaknesses:
It's questionable whether the prediction approach (i.e., CPM), even when combined with longitudinal data, can establish causality. I recommend removing the term 'consequence' in the abstract and replacing it with 'predict'. Additionally, the paper could benefit from enhanced rigor through additional analyses, such as testing various thresholds and conducting lagged effect analyses with covariate regression.
-
Author response:
The following is the authors’ response to the original reviews.
Reviewer #1 (Recommendations For The Authors):
Although the manuscript is well organized and written, it could be largely improved and therefore made more plausible and easier to read. See my point-by-point comments listed below:
(1) The introduction section is a bit overloaded with some unnecessary information. For example, the authors discussed the relationship between neurotransmitters in the prefrontal and striatum and substance use/sustained attention. However, the results are related to neither the neurotransmitters nor the striatum. In addition, there is a contradictory description about neurotransmitters there, Nicotine/THC leads to increased neurotransmitters, and decreased neurotransmitters is related to poor sustained attention. Does that mean that the use of Nicotine/THC could increase sustained attention?
Thanks for this insightful question. We understand your concern regarding the seemingly contradictory statements about neurotransmitters and sustained attention. Previous studies have shown that acute administration of nicotine can improve sustained attention (Lawrence et al., 2002; Potter and Newhouse, 2008; Valentine and Sofuoglu, 2018; Young et al., 2004). On the other hand, the acute effects of smoking cannabis on sustained attention are mixed and depend on factors such as dosage and individual differences (Crean et al., 2011). For instance, a previous study (Hart et al., 2001) found that performance on a tracking task, which requires sustained attention, was found to improve significantly after smoking cannabis with a high dose of THC, albeit in experienced cannabis users. However, chronic substance use, including nicotine and cannabis, has been associated with impaired sustained attention (Chamberlain et al., 2012; Dougherty et al., 2013).
To address your concerns and improve clarity and succinctness of the Introduction, we have removed the description of neurotransmitters from the Introduction. This revision should make the introduction more concise and focus on the direct relationships pertinent to our study.
(2) It is a bit hard to follow the story for the readers because the Results section went straight into detail. For example, the authors directly introduced that they used the ICV from the Go trials to index sustained attention without basic knowledge about the task. Why use the ICV of Go trials instead of other trials (i.e., successful stop trials) as an index of sustained attention? I suggest presenting the subjects and task details about the data before the detailed behavioral results. The results section should include enough information to understand the presenting results for the readers, rather than forcing the reader to find the answer in the later Methods section.
We appreciate your suggestion to provide more context about the task and ICV before diving into the detailed behavioural results.
We used the ICV derived from the Go trials instead of Success stop trials as an index of sustained attention, based on the nature of the stop-signal task and the specific data it generates. Previous studies have indicated that reaction time (RT) variability is a straightforward measure of sustained attention, with increasing variability thought to reflect poorer ability to sustain attention (Esterman and Rothlein, 2019). RT variability is defined as ICV, calculated as the standard deviation of mean Go RT divided by the mean Go RT from Go trials (O'Halloran et al., 2018). The stop signal task includes both Go trials and stop trials. During Go trials, participants are required to respond as quickly and accurately as possible to a Go signal, allowing for the recording of RT for calculating ICV. In contrast, stop trials are designed to measure inhibitory control, where successful response inhibition results in no RT or response recorded in the output. Therefore, Go trials are specifically used to assess sustained attention, while Stop trials primarily assess inhibitory control (Verbruggen et al., 2019).
We acknowledge the importance of providing this contextual information within the Results section to enhance reader understanding. We have added this information before presenting the behavioural results on Page 6.
Results
(1) Behavioural changes over time
Reaction time (RT) variability is a straightforward measure of sustained attention, with increasing variability thought to reflect poor sustained attention. RT variability is defined as intra-individual coefficient of variation (ICV), calculated as the standard deviation of mean Go RT divided by the mean Go RT from Go trials in the stop signal task. Lower ICV indicates better sustained attention.
(3) The same problem for section 2 in the Results. What are the predictive networks? Are the predictive networks the same as the networks constructed based on the correlation with ICV? My intuitive feeling is that they are the circular analyses here. The positive/negative/combined networks are calculated based on the correlation between the edges and ICV. Then the author used the network to predict the ICV again. The manipulation from the raw networks (I think they are based on PPI) to the predictive network, and the calculation of the predicted ICV are all missing. The direct exposure of the results to the readers without enough detailed knowledge made everything hard to digest.
We thank the Reviewer for the insightful comment. We agree with the need for more clarity regarding the predictive networks and the CPM analysis before presenting results. CPM, a data-driven neuroscience approach, is applied to predict individual behaviour from brain functional connectivity (Rosenberg et al., 2016; Shen et al., 2017). The CPM analysis used the strength of the predictive network to predict the individual difference in traits and behaviours. CPM includes several steps: feature selection, feature summarization, model building, and assessment of prediction significance (see Fig. S1).
During feature selection, we assessed whether connections between brain areas (i.e., edges) in a task-related functional connectivity matrix (derived from general psychophysiological interaction analysis) were positively or negatively correlated with ICV using a significance threshold of P < 0.01. These positively or negatively correlated connections are regarded as positive or negative network, respectively. The network strength of the positive network (or negative network) was determined in each individual by summing the connection strength of each positively (or negatively) correlated edge. The combined network was determined by subtracting the strength of the negative network from the positive network. Next, CPM built a linear model between the network strength of the predictive network and ICV. This model was initially developed using the training set. The predictive networks were then applied to the test set, where network strength was calculated again, and the linear model was used to predict ICV using k-fold cross-validation. Following your advice, we have updated it in the Results section to include these details on Page 7.
Results
(2) Cross-sectional brain connectivity
This study employed CPM, a data-driven neuroscience approach, to identify three predictive networks— positive, negative, and combined— that predict ICV from brain functional connectivity. CPM typically uses the strength of the predictive networks to predict individual differences in traits and behaviors. The predictive networks were obtained based on connectivity analyses of the whole brain. Specifically, we assessed whether connections between brain areas (i.e., edges) in a task-related functional connectivity matrix derived from generalized psychophysiological interaction analysis were positively or negatively correlated with ICV using a significance threshold of P < 0.01. These positively or negatively correlated connections were regarded as positive or negative network, respectively. The network strength of positive networks (or negative networks) was determined for each individual by summing the connection strength of each positively (or negatively) correlated edge. The combined network was determined by subtracting the strength of the negative network from the positive network. We then built a linear model between network strength and ICV in the training set and applied these predictive networks to yield network strength and a linear model in the test set to calculate predicted ICV using k-fold cross validation.
(4) The authors showed the positive/negative/combined networks from both Go trials and successful stop trials can predict the ICV. I am wondering how the author could validate the specificity of the prediction of these positive/negative/combined networks. For example, how about the networks from the failed stop trials?
We appreciate the opportunity to clarify the specificity of the predictive networks identified in our study. Here is a more detailed explanation of our findings and their implications.
To validate the specificity of the sustained attention network identified from CPM analysis, we calculated correlations between the network strength of positive and negative networks and performances from a neuropsychology battery (CANTAB) at each timepoint separately. CANTAB includes several tasks that measure various cognitive functions, such as sustained attention, inhibitory control, impulsivity, and working memory. We found that all positive and negative networks derived from Go and Successful stop trials significantly correlated with a behavioural assay of sustained attention – the rapid visual information processing (RVP) task – at ages 14 and 19 (all P values < 0.028). Age 23 had no RVP task data in the IMAGEN study. There were sporadic significant correlations between constructs such as delay aversion/impulsivity and negative network strength, for example, but the correlations with the RVP were always significant. This demonstrates that the strength of the sustained attention brain network was specifically and robustly correlated with a typical sustained attention task, rather than other cognitive measures. The results are described in the main text on Page 8 and shown in Supplementary materials (Pages 1 and 3) and Table S12.
In addition, we conducted a CPM analysis to predict ICV using gPPI under Failed stop trials. Our findings showed that positive, negative, and combined networks derived from Failed stop trials significantly predicted ICV: at age 14 (r = 0.10, P = 0.033; r = 0.19, P < 0.001; and r = 0.17, P < 0.001, respectively), at age 19 (r = 0.21; r = 0.18; and r = 0.21, all P < 0.001, respectively), and at age 23 (r = 0.33, r = 0.35, and r = 0.36, respectively, all P < 0.001). Similar results were obtained using a 5-fold CV and leave-site-out CV.
Our analysis further showed that task-related functional connectivity derived from Go trials, Successful Stop trials, and Failed Stop trials could predict sustained attention across three timepoints. However, the predictive performances of networks derived from Go trials were higher than those from Successful Stop and Failed Stop trials. This suggests that sustained attention is particularly crucial during Go trials when participants need to respond to the Go signal. In contrast, although Successful Stop and Failed Stop trials also require sustained attention, these tasks primarily involve inhibitory control along with sustained attention.
Taken together, these findings underscore the specificity of the predictive networks of sustained attention. We have updated these results in the Supplementary Materials (Pages 3-5 and Page 7 ):
Method
CPM analysis using Failed stop trials
We performed another CPM analysis using Failed stop trials using gPPI matrix obtained from the second GLM, described in the main text. The CPM analysis was conducted using 10-fold CV, 5-fold CV and leave-site-out CV.
Results
CPM predictive performance under Failed stop trials
Positive, negative, and combined networks derived from Failed stop trials significantly predicted ICV: at age 14 (r = 0.10, P = 0.033; r = 0.19, P < 0.001; and r = 0.17, P < 0.001, respectively), at age 19 (r = 0.21; r = 0.18; and r = 0.21, all P < 0.001, respectively), and at age 23 (r = 0.33, r = 0.35, and r = 0.36, respectively, all P < 0.001). We obtained similar results using a 5-fold CV and leave-site-out CV (Table S6).
Discussion
Specificity of the prediction of predictive networks
We found that task-related function connectivity derived from Go trials, Successful stop trials, and Failed stop trials successfully predicted sustained attention across three timepoints. However, predictive performances of predictive networks derived from Go trials were higher than those derived from Successful stop trials and Failed stop trials. These results suggest that sustained attention is particularly crucial during Go trials when participants need to respond to the Go signal. In contrast, although Successful Stop and Failed Stop trials also require sustained attention, these tasks primarily involve inhibitory control along with sustained attention.
(5) The author used PPI to define the connectivity of the network. I am not sure why the author used two GLMs for the PPI analysis separately. In the second GLM, Go trials were treated as an implicit baseline. What does this exactly mean? And the gPPI analysis across the entire brain using the Shen atlas is not clear. Normally, as I understand, the PPI/gPPI is conducted to test the task-modulated connectivity between one seed region and the voxels of the whole rest brain. Did the author perform the PPI for each ROI from Shen atlas? More details about how to use PPI to construct the network are required.
Thank you for your insightful questions. Here, we’d like to clarify how we applied generalized PPI across the whole brain using the Shen atlas and why we used two separate GLMs for the gPPI analysis.
Yes, PPI is conducted to test the task-modulated connectivity between one seed region and other brain areas. This method can be both voxel-based and ROI-based. In our study, we performed ROI-based gPPI analysis using Shen atlas with 268 regions. Specifically, we performed the PPI on each seed region of interest (ROI) to estimate the task-related FC between this ROI and the remaining ROI (267 regions) under a specific task condition. By performing this analysis across each ROI in the Shen atlas, we generated a 268 × 268 gPPI matrix for each task condition. The matrices were then transposed and averaged with the original matrices, which yielded symmetrical matrices, which were subsequently used for CPM analysis.
Regarding the use of two separate GLMs for the gPPI analysis, our study aimed to define the task-related FC under two conditions: Go trials and Successful stop trials. The first GLM including Go trials was built to estimate the gPPI during Go trials. However, due to the high frequency of Go trials in the stop signal task, it is common to regard the Go trials as an implicit baseline, as in previous IMAGEN studies (D'Alberto et al., 2018; Whelan et al., 2012). Therefore, to achieve a more accurate estimation of FC during Successful stop trials, we built a second GLM specifically for these trials. Accordingly, we have updated it in the Method Section in the main text on Page 16.
Method
2.5 Generalized psychophysiological interaction (gPPI) analysis
In this study, we adopted gPPI analysis to generate task-related FC matrices and applied CPM analysis to investigate predictive brain networks from adolescents to young adults. PPI analysis describes task-dependent FC between brain regions, traditionally examining connectivity between a seed region of interest (ROI) and the voxels of the whole rest brain. However, this study conducted a generalized PPI analysis, which is on ROI-to-ROI basis (Di et al., 2021), to yield a gPPI matrix across the whole brain instead of just a single seed region.
Given the high frequency of Go trials in SST, it is common to treat Go trials as an implicit baseline in previous IMAGEN studies (D'Alberto et al., 2018; Whelan et al., 2012). Hence, we built a separate GLM for Successful stop trials, which included two task regressors (Failed and Successful stop trials) and 36 nuisance regressors.
(6) Why did the author use PPI to construct the network, rather than the other similar methods, for example, beta series correlation (BSC)?
Thanks for your question. PPI is an approach used to calculate the functional connectivity (FC) under a specific task (i.e., task-related FC). Although most brain connectomic research has utilized resting-state FC (e.g., beta series correlation), FC during task performance has demonstrated superiority in predicting individual behaviours and traits, due to its potential to capture more behaviourally relevant information (Dhamala et al., 2022; Greene et al., 2018; Yoo et al., 2018). Specifically, Zhao et al. (2023) suggested that task-related FC outperforms both typical task-based and resting-state FC in predicting individual differences. Therefore, we chose to use task-related FC to predict sustained attention over time. We have updated it in the Introduction on Page 5.
Introduction
Although most brain connectomic research has utilized resting-state fMRI data, functional connectivity (FC) during task performance has demonstrated superiority in predicting individual behaviours and traits, due to its potential to capture more behaviourally relevant information (Dhamala et al., 2022; Greene et al., 2018; Yoo et al., 2018). Specifically, Zhao et al. (2023) suggested that task-related FC outperforms both typical task-based and resting-state FC in predicting individual differences. Hence, we applied task-related FC to predict sustained attention over time.
(7) In the section of 'Correlation analysis between the network strength and substance use', the author just described that 'the correlations between xx and xx are shown in Fig5X', and repeated it three times for three correlation results. What exactly are the results? The author should describe the results in detail. And I am wondering whether there are scatter plots for these correlation analyses?
We’d like to clarify the results in Fig. 5. Fig. 5 illustrates the significant correlations between behaviour and brain activity associated with sustained attention and Cigarette and cannabis use (Cig+CB) after FDR correction. Panel A shows the significant correlation between behaviour level of sustained attention and Cig+CB. Panels B and C show the correlations between brain activity associated with sustained attention and Cig+CB. While Panel B presents the brain activity derived from Go trials, Panel C presents brain activity derived from Successful stop trials. In response to your suggestion, we have described these results in detail on Page 9. We also have included scatter plots for the significant correlations, which are shown in Fig. 5 in Supplementary materials (Fig. S10).
Results
(6) Correlation between behaviour and brain to cannabis and cigarette use
Figs. 5A-C summarizes the results showing the correlation between ICV/brain activity and Cig+CB per timepoint and across timepoints. Fig. 5A shows correlations between ICV and Cig+CB (Tables S14-15). ICV was correlated with Cig+CB at ages 19 (Rho = 0.13, P < 0.001) and 23 (Rho = 0.17, P < 0.001). ICV at ages 14 (Rho = 0.13, P = 0.007) and 19 (Rho = 0.13, P = 0.0003) were correlated with Cig+CB at age 23. Cig+CB at age 19 was correlated with ICV at age 23 (Rho = 0.13, P = 9.38E-05). Fig. 5B shows correlations between brain activity derived from Go trials and Cig+CB (Tables S18-19). Brain activities of positive and negative networks derived from Go trials were correlated with Cig+CB at age 23 (positive network: Rhop = 0.12, P < 0.001; negative network: Rhon = -0.11, P < 0.001). Brain activity of the negative network derived from Go trials at age 14 was correlated with Cig+CB at age 23 (Rhon = -0.16, P = 0.001). Cig+CB at age 19 was correlated with brain activity of the positive network derived from Go trials at age 23 (Rhop = 0.10, P = 0.002). Fig. 5C shows the correlations between brain activity derived from Successful stop and Cig+CB (Tables S18-19). Brain activities of positive and negative networks derived from Successful stop were correlated with Cig+CB at ages 19 (positive network: Rhop = 0.10, P = 0.001; negative network: Rhon = -0.08, P = 0.013) and 23 (positive network: Rhop = 0.13, P < 0.001; negative network: Rhon = -0.11, P = 0.001).
(8) Lastly, the labels of (A), (B) ... in the figure captions are unclear. The authors should find a better way to place the labels in the caption and keep them consistent throughout all figures.
Thank you for this valuable comment. We have revised the figure captions in the main text to ensure the labels (A), (B), etc., are placed more clearly and consistently across all figures.
Reviewer #2 (Public Review):
While the study largely achieves its aims, several points merit further clarification:
(1) Regarding connectome-based predictive modeling, an assumption is that connections associated with sustained attention remain consistent across age groups. However, this assumption might be challenged by observed differences in the sustained attention network profile (i.e., connections and related connection strength) across age groups (Figures 2 G-I, Fig. 3 G_I). It's unclear how such differences might impact the prediction results.
Thank you for your insightful comment. We’d like to clarify that we did not assume that connections associated with sustained attention remain completely consistent across age groups. Indeed, we expected that connections would change across age groups, due to the developmental changes in brain function and structure from adolescence to adulthood. Our focus was on the consistency of individual differences in sustained attention networks over time, recognising that the actual connections within those networks may change. However, we did show that there is some consistency in the specific connections associated with sustained attention over time. Notably, this consistency markedly increases when comparing ages 19 and 23, when developmental factors are less relevant. We support our reasoning above with the following analyses:
(1) Supplementary materials (Pages 2 and 5), relevant sections highlighted here for emphasis.
Method
Comparison of predictive networks identified at one timepoint versus another
Steiger’s Z value was employed to compare predictive performances of networks identified at different timepoints. This analysis involved comparing the R values derived from networks defined at distinct ages to predict ICV at the same age. For example, we compared the r values of brain networks defined at age 14 when predicting ICV at 19 (i.e., positive network: r = 0.25, negative network: r = 0.25, combined network: r = 0.28) with those R values of brain networks defined at age 19 itself (i.e., positive network: r = 0.16, negative network: r = 0.14, combined network: r = 0.16) derived from Go trials using Steiger's Z test (age 14 → age 19 vs. age 19 → 19). Similarly, comparisons were made between networks defined at age 14 predicting ICV at age 23 and those at age 23 predicting ICV at age 23 (age 14 → age 23 vs. age 23 → 23), as well as between networks defined at age 19 predicting ICV at age 23 and those at age 23 predicting ICV at age 23 (age 19 -> age 23 vs. age 23 -> age 23). These comparisons were performed separately for Go trials and Successful Stop trials.
Results
Comparison of predictive performance at different timepoints
For positive, negative, and combined networks predicting ICV derived from Go trials at age 19, the R values were higher when using predictive networks defined at 19 than those defined at 14 (Z = 3.79, Z = 3.39, Z = 3.99, all P < 0.00071). Similarly, the R values for positive, negative, and combined networks predicting ICV derived from Go trials at age 23 were higher when using predictive networks defined at age 23 compared to those defined at ages 14 (Z = 6.00, Z = 5.96, Z = 6.67, all P < 3.47e-9) or 19 (Z = 2.80, Z = 2.36, Z = 2.57, all P < 0.005).
At age 19, the R value for the positive network predicting ICV derived from Successful stop trials was higher when using predictive networks defined at 19 compared to those defined at 14 (Z = 1.54, P = 0.022), while the negative and combined networks did not show a significant difference (Z = 0.85, P = 0.398; Z = 2.29, P = 0.123). At age 23, R values for the positive and combined networks predicting ICV derived from Successful stop trials were higher when using predictive networks defined at 23 compared to those defined at 14 (Z = 3.00, Z = 2.48, all P < 3.47e-9) or 19 (Z = 2.52, Z = 1.99, all P < 0.005). However, the R value for the negative network at age 23 did not significantly differ when using predictive networks defined at 14 (Z = 1.80, P = 0.072) or 19 (Z = 1.48, P = 0.138).
These results indicate that some specific pairwise connections associated with sustained attention at earlier ages, such as 14 and 19, are still relevant as individuals grow older. However, some connections are not optimal for good sustained attention at older ages. That is, the brain reorganizes its connection patterns to maintain optimal functionality for sustained attention as it matures.
(2) Consistency of Individual Differences:
We found individual differences in ICV were significantly correlated between the three timepoints (Fig. 1B). In addition, we calculated the correlations of network strength of predictive networks predicting sustained attention derived from Go trials and Successful trials between each timepoints. We found that the correlations of network strength for predictive networks (derived from Go trials and Successful trials) were also significant (all P < 0.003). We have updated these results in the main text (Pages 7-8) and Supplementary Materials (Table S7).
(2) Cross-sectional brain connectivity
In addition, we found that network strength of positive, negative, and combined networks derived from Go trials was significantly correlated between the three timepoints (Table S7, all P < 0.003).
In addition, we found that network strength of positive, negative, and combined networks derived from Successful stop trials was significantly correlated between the three timepoints (Table S7, all P < 0.001).
(3) Predictive networks across timepoints: Predictive networks defined at age 14 were successfully applied to predict ICV at ages 19 and 23. Similarly, predictive networks defined at age 19 were successfully applied to predict ICV at age 23 (Fig. 4). These results reflect the robustness of the brain network associated with sustained attention over time.
(4) Dice coefficient analysis: We calculated the Dice coefficient to quantify the similarity of predictive networks across the three timepoints. Connections in the sustained attention networks were significantly similar from ages 14 to 23 (Table S13), despite relatively few overlapping edges over time (as discussed in Supplementary Materials on Page 6).
(5) Global brain activation: Based on these findings, we indicate that sustained attention relies on global brain activation (i.e., network strength) rather than specific regions or networks (see also (Zhao et al., 2021)).
In summary, brain network connections undergo change and are not completely consistent across time. However, individual differences in sustained attention and its network are consistent across time, as we found that 1) the brain reorganizes its connection patterns to maintain optimal functionality for sustained attention as it matures. 2) ICV and network strength of sustained attention network were significantly correlated between each timepoint. 3) Sustained attention networks identified from previous timepoints could predict ICV in the subsequent timepoint. 4) Dice coefficient analysis indicated that the edges in the sustained attention networks were significantly similar from ages 14 to 23. 5) Sustained attention networks function as a global activation, rather than specific regions or networks.
(2) Another assumption of the connectome-based predictive modeling is that the relationship between sustained attention network and substance use is linear and remains linear over development. Such linear evidence from either the literature or their data would be of help.
Thanks for your valuable suggestion. We'd like to clarify that while CPM assumes a linear relationship between brain and behaviour (Shen et al., 2017), it does not assume that the relationship between the sustained attention network and substance use remains linear over development.
Our approach in applying CPM to predict sustained attention across different timepoints was based on previous neuroimaging studies (Rosenberg et al., 2016; Rosenberg et al., 2020), which indicated linear associations between brain connectivity patterns and sustained attention using CPM analysis. These findings support the notion of a linear relationship between brain connectivity and sustained attention. In this study, we performed CPM analysis to identify predictive networks predicting sustained attention, not substance use and used the network strength of these predictive networks to represent sustained attention activity.
To examine the relationship between substance use and sustained attention, as well as its associated brain activity, we conducted correlation analyses and utilized a latent change score model instead of CPM analysis. This decision was informed by cross-sectional studies (Broyd et al., 2016; Lisdahl and Price, 2012) that consistently reported linear associations between substance use and impairments in sustained attention. Additionally, longitudinal research by (Harakeh et al., 2012) indicated a linear relationship between poorer sustained attention and the initiation and escalation of substance use over time.
Given these previous findings, we assumed a linear relationship between sustained attention and substance use. Our analyses included calculating correlations between substance use and sustained attention, as well as its associated brain activity at each timepoint and across timepoints (Fig. 5). Furthermore, we employed a three-wave bivariable latent change score model, a longitudinal approach, to assess the relationship between substance use and behavirour and brain activity associated with sustained attention (Figs. 6-7). We have added more information in the Introduction to make it more clear on Page 6.
Introduction
Additionally, previous cross-sectional and longitudinal studies (Broyd et al., 2016; Harakeh et al., 2012; Lisdahl and Price, 2012) have shown that there are linear relationships between substance use and sustained attention over time. We therefore employed correlation analyses and a latent change score model to estimate the relationship between substance use and both behaviours and brain activity associated with sustained attention.
(3) Heterogeneity in results suggests individual variability that is not fully captured by group-level analyses. For instance, Figure 1A shows decreasing ICV (better-sustained attention) with age on the group level, while there are both increasing and decreasing patterns on the individual level via visual inspection. Figure 7 demonstrates another example in which the group with a high level of sustained attention has a lower risk of substance use at a later age compared to that in the group with a low level of sustained attention. However, there are individuals in the high sustained attention group who have substance use scores as high as those in the low sustained attention group. This is important to take into consideration and could be a potential future direction for research.
Thanks for this valuable comment. We appreciate your observation regarding the individual variability that is not fully captured by group-level analyses to some degree. Fig. 1A shows the results from a linear mixed model, which explains group-level changes over time while accounting for the random effect within subjects. Similarly, Fig. 7 shows the group-level association between substance use and sustained attention. We agree that future research could indeed consider individual variability. For example, participants could be categorized based on their consistent trajectories of ICV or substance use (i.e., keep decreasing/increasing) over multiple timepoints. We agree that incorporating individual-level analyses in the future could provide valuable insights and are grateful for your suggestion, which will inform our future research directions.
The above-mentioned points might partly explain the significant but low correlations between the observed and predicted ICV as shown in Figure 4. Addressing these limitations would help enhance the study's conclusions and guide future research efforts.
We have updated the text in the Discussion on Page 13:
Discussion
However, there are still some individual variabilities not captured in this study, which could be attributed to the diversity in genetic, environmental, and developmental factors influencing sustained attention and substance use. Future research should aim to explore these variabilities in greater depth to gain better understanding of the relationship between sustained attention and substance use.
Reviewer #3 (Public Review):
Weaknesses: It's questionable whether the prediction approach (i.e., CPM), even when combined with longitudinal data, can establish causality. I recommend removing the term 'consequence' in the abstract and replacing it with 'predict'. Additionally, the paper could benefit from enhanced rigor through additional analyses, such as testing various thresholds and conducting lagged effect analyses with covariate regression.
Thank you for your comment. We have replaced “consequence” by “predict” in the abstract.
Abstract
Previous studies were predominantly cross-sectional or under-powered and could not indicate if impairment in sustained attention was a predictor of substance-use or a marker of the inclination to engage in such behaviour.
Reviewer #3 (Recommendations For The Authors):
(1) The connectivity analysis predicts both baseline and longitudinal attention measures. However, given the high correlation in attention abilities across the three time-points, it's unclear whether the connectivity predicts shared variations of attention across three time points. It would be insightful to assess if predictions at the 2nd and 3rd-time points remained significant after controlling for attention abilities at the initial time point.
Thanks for your comments. We performed the CPM analysis to predict ICV at the 2nd and 3rd timepoint, controlling for ICV at age 14 as a covariate. We found that controlling for ICV at age 14, positive, negative, and combined networks derived from Successful stop trials defined at age 14 still predicted ICV at ages 19 and 23. In addition, positive, negative, and combined networks derived from Successful stop trials defined at age 19 predicted ICV at age 23. In addition, positive, negative, and combined networks derived from Go trials defined at age 19 still predicted ICV at age 23, after controlling for ICV at age 14. However, positive, negative, and combined networks derived from Go trials defined at age 14 had lower predictive performances in predicting ICV at ages 19 and 23, after controlling for ICV at age 14. Notably, controlling for ICV at the initial timepoint did not significantly impact the performances of predictive networks derived from Successful stop trials. Accordingly, we have added this analysis and the results in the Supplementary Materials (Pages 3 and 5).
Method
Prediction across timepoints controlling for ICV at age 14
To examine whether connectivity predictors shared variations of sustained attention across timepoints, we applied predictive models developed at ages 14 and 19 to predict ICV at subsequent timepoints controlling for ICV at age 14. Specifically, we used predictive models (including parameters and selected edges) developed at age 14 to predict ICV at ages 19 and 23 separately. First, we calculated the network strength using the gPPI matrix at ages 19 and 23 based on the selected edges identified from CPM analysis at age 14. We then estimated the predicted ICV at ages 19 and 23 by applying the linear model parameters (slope and intercept) obtained from CPM analysis at age 14 to the network strength. Finally, we evaluated the predictive performance by calculating the partial correlation between the predicted and observed values at ages 19 and 23, controlling for ICV at age 14. Similarly, we applied models developed at age 19 to predict ICV at age 23, also controlling for ICV at age 14. To assess the significance of the predictive performance, we used a permutation test, shuffling the predicted ICV values and calculating partial correlation to general a random distribution over 1,000 iterations.
Results
Predictions across timepoints controlling for ICV at age 14
Positive and combined networks derived from Go trials defined at age 14 predicted ICV at ages 19 (r = 0.10, P = 0.028; r = 0.08, P = 0.047) but negative network did not (r = 0.06, P = 0.119). Positive network derived from Go trials defined at age 14 predicted ICV at age 23 (r = 0.11, P = 0.013) but negative and combined networks did not (r = 0.04, P = 0.187; r = 0.08, P = 0.056). Positive, negative, and combined networks derived from Go trials defined at age 19 predicted ICV at age 23 (r = 0.22, r = 0.19, and r = 0.22, respectively, all P < 0.001).
Positive, negative, and combined networks derived from Successful stop trials defined at age 14 predicted ICV at age 19 (r = 0.08, P = 0.036; r = 0.10, P = 0.012; r = 0.11, P = 0.009) and 23 (r = 0.11, P = 0.005; r = 0.13, P = 0.005; r = 0.13, P = 0.017) respectively. Positive, negative, and combined networks derived from Successful stop trials defined at age 19 predicted ICV at age 23 (r = 0.18, r = 0.18, and r = 0.17, respectively, all P < 0.001).
(2) In the Results section, a significance threshold of p = 0.01 was used for the CPM analysis. It would be beneficial to test the stability of these findings using alternative thresholds such as p = 0.05 or p = 0.005.
We appreciate this insightful comment. We appreciate the suggestion to test the stability of our findings using alternative significance thresholds. Indeed, we have already conducted CPM analyses using a range of thresholds, including 0.1, 0.05, 0.01, 0.005, 0.001, 0.0005, and 0.0001 (see Table S8 in supplementary Materials). The results were similar across different thresholds. Following prior studies (Feng et al., 2024; Ren et al., 2021; Yoo et al., 2018) which used P < 0.01 for feature selection, we chose to focus on the threshold of P < 0.01 for our main analysis. Following your suggestion, we have highlighted this in the Method section on Pages 17-18.
Method
2.6.1 ICV prediction
The r value with an associated P value for each edge was obtained, and a threshold P = 0.01 (Feng et al., 2024; Ren et al., 2021; Yoo et al., 2018) was set to select edges.
2.6.2 Three cross-validation schemes
In addition, we conducted the CPM analysis using a range of thresholds for feature selection and observed similar results across different thresholds (See Supplementary Materials Table S8).
(3) Could you clarify if you used one sub-sample to extract connectivity related to sustained attention and then used another sub-sample to predict substance use with attention-related connectivity?
Thank you very much for the question. We used the same sample to extract the brain network strength and estimated the correlation with substance use using both the Spearman correlation and latent change score model across three timepoints. We controlled for covariates including sex, age, and scan site at the same time. Accordingly, we have clarified this in the Method section on Page 20. We note that the CPM analyses were conducted using cross-validation, plus a leave-site-out analysis.
Method
2.7.3 Correlation between network strength and substance use
It is worth noting that all the correlations between substance use and sustained attention were conducted using the same sample across three timepoints.
(4) Could you clarify whether you have regressed covariates in the lagged effects analysis of part 7?
Thanks for this question. Yes, we confirmed that we controlled the covariates including age, sex and scan sites in the latent change score model. We have described them more clearly now in the Method section (Page 18).
Method
2.7.3 Correlation between network strength and substance use
Additionally, cross-lagged dynamic coupling (i.e., bidirectionality) was employed to explore individual differences in the relationships between substance use and linear changes in ICV/brain activity, as well as the relationship between ICV/brain activity and linear change in substance use. The model accounted for covariates such as age, sex and scan sites.
References:
Broyd, S.J., van Hell, H.H., Beale, C., Yucel, M., Solowij, N., 2016. Acute and Chronic Effects of Cannabinoids on Human Cognition-A Systematic Review. Biol Psychiatry 79, 557-567.
Chamberlain, S.R., Odlaug, B.L., Schreiber, L.R.N., Grant, J.E., 2012. Association between Tobacco Smoking and Cognitive Functioning in Young Adults. The American Journal on Addictions 21, S14-S19.
Crean, R.D., Crane, N.A., Mason, B.J., 2011. An evidence based review of acute and long-term effects of cannabis use on executive cognitive functions. J Addict Med 5, 1-8.
D'Alberto, N., Chaarani, B., Orr, C.A., Spechler, P.A., Albaugh, M.D., Allgaier, N., Wonnell, A., Banaschewski, T., Bokde, A.L.W., Bromberg, U., Buchel, C., Quinlan, E.B., Conrod, P.J., Desrivieres, S., Flor, H., Frohner, J.H., Frouin, V., Gowland, P., Heinz, A., Itterman, B., Martinot, J.L., Paillere Martinot, M.L., Artiges, E., Nees, F., Papadopoulos Orfanos, D., Poustka, L., Robbins, T.W., Smolka, M.N., Walter, H., Whelan, R., Schumann, G., Potter, A.S., Garavan, H., 2018. Individual differences in stop-related activity are inflated by the adaptive algorithm in the stop signal task. Hum Brain Mapp 39, 3263-3276.
Dhamala, E., Yeo, B.T.T., Holmes, A.J., 2022. Methodological Considerations for Brain-Based Predictive Modelling in Psychiatry. Biological Psychiatry.
Di, X., Zhang, Z.G., Biswal, B.B., 2021. Understanding psychophysiological interaction and its relations to beta series correlation. Brain Imaging and Behavior 15, 958-973.
Dougherty, D.M., Mathias, C.W., Dawes, M.A., Furr, R.M., Charles, N.E., Liguori, A., Shannon, E.E., Acheson, A., 2013. Impulsivity, attention, memory, and decision-making among adolescent marijuana users. Psychopharmacology (Berl) 226, 307-319.
Esterman, M., Rothlein, D., 2019. Models of sustained attention. Curr Opin Psychol 29, 174-180.
Feng, Q., Ren, Z., Wei, D., Liu, C., Wang, X., Li, X., Tie, B., Tang, S., Qiu, J., 2024. Connectome-based predictive modeling of Internet addiction symptomatology. Soc Cogn Affect Neurosci 19.
Greene, A.S., Gao, S., Scheinost, D., Constable, R.T., 2018. Task-induced brain state manipulation improves prediction of individual traits. Nature Communications 9, 2807.
Harakeh, Z., de Sonneville, L., van den Eijnden, R.J., Huizink, A.C., Reijneveld, S.A., Ormel, J., Verhulst, F.C., Monshouwer, K., Vollebergh, W.A., 2012. The association between neurocognitive functioning and smoking in adolescence: the TRAILS study. Neuropsychology 26, 541-550.
Hart, C.L., van Gorp, W., Haney, M., Foltin, R.W., Fischman, M.W., 2001. =. Neuropsychopharmacology 25, 757-765.
Lawrence, N.S., Ross, T.J., Stein, E.A., 2002. Cognitive mechanisms of nicotine on visual attention. Neuron 36, 539-548.
Lisdahl, K.M., Price, J.S., 2012. Increased marijuana use and gender predict poorer cognitive functioning in adolescents and emerging adults. J Int Neuropsychol Soc 18, 678-688.
O'Halloran, L., Cao, Z.P., Ruddy, K., Jollans, L., Albaugh, M.D., Aleni, A., Potter, A.S., Vahey, N., Banaschewski, T., Hohmann, S., Bokde, A.L.W., Bromberg, U., Buchel, C., Quinlan, E.B., Desrivieres, S., Flor, H., Frouin, V., Gowland, P., Heinz, A., Ittermann, B., Nees, F., Orfanos, D.P., Paus, T., Smolka, M.N., Walter, H., Schumann, G., Garavan, H., Kelly, C., Whelan, R., 2018. Neural circuitry underlying sustained attention in healthy adolescents and in ADHD symptomatology. Neuroimage 169, 395-406.
Potter, A.S., Newhouse, P.A., 2008. Acute nicotine improves cognitive deficits in young adults with attention-deficit/hyperactivity disorder. Pharmacol Biochem Behav 88, 407-417.
Ren, Z., Daker, R.J., Shi, L., Sun, J., Beaty, R.E., Wu, X., Chen, Q., Yang, W., Lyons, I.M., Green, A.E., Qiu, J., 2021. Connectome-Based Predictive Modeling of Creativity Anxiety. Neuroimage 225, 117469.
Rosenberg, M.D., Finn, E.S., Scheinost, D., Papademetris, X., Shen, X., Constable, R.T., Chun, M.M., 2016. A neuromarker of sustained attention from whole-brain functional connectivity. Nat Neurosci 19, 165-171.
Rosenberg, M.D., Scheinost, D., Greene, A.S., Avery, E.W., Kwon, Y.H., Finn, E.S., Ramani, R., Qiu, M., Constable, R.T., Chun, M.M., 2020. Functional connectivity predicts changes in attention observed across minutes, days, and months. Proc Natl Acad Sci U S A 117, 3797-3807.
Shen, X., Finn, E.S., Scheinost, D., Rosenberg, M.D., Chun, M.M., Papademetris, X., Constable, R.T., 2017. Using connectome-based predictive modeling to predict individual behavior from brain connectivity. Nat Protoc 12, 506-518.
Valentine, G., Sofuoglu, M., 2018. Cognitive Effects of Nicotine: Recent Progress. Curr Neuropharmacol 16, 403-414.
Verbruggen, F., Aron, A.R., Band, G.P.H., Beste, C., Bissett, P.G., Brockett, A.T., Brown, J.W., Chamberlain, S.R., Chambers, C.D., Colonius, H., Colzato, L.S., Corneil, B.D., Coxon, J.P., Dupuis, A., Eagle, D.M., Garavan, H., Greenhouse, I., Heathcote, A., Huster, R.J., Jahfari, S., Kenemans, J.L., Leunissen, I., Li, C.S.R., Logan, G.D., Matzke, D., Morein-Zamir, S., Murthy, A., Pare, M., Poldrack, R.A., Ridderinkhof, K.R., Robbins, T.W., Roesch, M.R., Rubia, K., Schachar, R.J., Schall, J.D., Stock, A.K., Swann, N.C., Thakkar, K.N., van der Molen, M.W., Vermeylen, L., Vink, M., Wessel, J.R., Whelan, R., Zandbelt, B.B., Boehler, C.N., 2019. A consensus guide to capturing the ability to inhibit actions and impulsive behaviors in the stop-signal task. Elife 8.
Whelan, R., Conrod, P.J., Poline, J.B., Lourdusamy, A., Banaschewski, T., Barker, G.J., Bellgrove, M.A., Buchel, C., Byrne, M., Cummins, T.D., Fauth-Buhler, M., Flor, H., Gallinat, J., Heinz, A., Ittermann, B., Mann, K., Martinot, J.L., Lalor, E.C., Lathrop, M., Loth, E., Nees, F., Paus, T., Rietschel, M., Smolka, M.N., Spanagel, R., Stephens, D.N., Struve, M., Thyreau, B., Vollstaedt-Klein, S., Robbins, T.W., Schumann, G., Garavan, H., Consortium, I., 2012. Adolescent impulsivity phenotypes characterized by distinct brain networks. Nat Neurosci 15, 920-925.
Yoo, K., Rosenberg, M.D., Hsu, W.T., Zhang, S., Li, C.R., Scheinost, D., Constable, R.T., Chun, M.M., 2018. Connectome-based predictive modeling of attention: Comparing different functional connectivity features and prediction methods across datasets. Neuroimage 167, 11-22.
Young, J.W., Finlayson, K., Spratt, C., Marston, H.M., Crawford, N., Kelly, J.S., Sharkey, J., 2004. Nicotine improves sustained attention in mice: evidence for involvement of the alpha7 nicotinic acetylcholine receptor. Neuropsychopharmacology 29, 891-900.
Zhao, W., Makowski, C., Hagler, D.J., Garavan, H.P., Thompson, W.K., Greene, D.J., Jernigan, T.L., Dale, A.M., 2023. Task fMRI paradigms may capture more behaviorally relevant information than resting-state functional connectivity. Neuroimage, 119946.
Zhao, W., Palmer, C.E., Thompson, W.K., Chaarani, B., Garavan, H.P., Casey, B.J., Jernigan, T.L., Dale, A.M., Fan, C.C., 2021. Individual Differences in Cognitive Performance Are Better Predicted by Global Rather Than Localized BOLD Activity Patterns Across the Cortex. Cereb Cortex 31, 1478-1488.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This paper provides potentially useful insight into why memory consolidation may differ between children (5-7 years of age) and adults. The work hints at developmental differences in neural engagement during the retrieval of recent and remote memories. However, there are several major issues with the experimental design and analyses that render the evidence supporting the authors' main claims incomplete.
-
Reviewer #1 (Public Review):
Summary:
This paper by Schommartz and colleagues investigates the neural basis of memory reinstatement as a function of both how recently the memory was formed (recent, remote) and its development (children, young adults). The core question is whether memory consolidation processes as well as the specificity of memory reinstatement differ with development. A number of brain regions showed a greater activation difference for recent vs. remote memories at the long versus shorter delay specifically in adults (cerebellum, parahippocampal gyrus, LOC). A different set showed decreases in the same comparison, but only in children (precuneus, RSC). The authors also used neural pattern similarity analysis to characterize reinstatement, though still in this revised paper I have substantive concerns about how the analyses were performed. While scene-specific reinstatement decreased for remote memories in both children and adults, claims about its presence cannot be made given the analyses. Gist-level reinstatement was observed in children but not adults, but I also have concerns about this analysis. Broadly, the behavioural and univariate findings are consistent with the idea memory consolidation differs between children and adults in important ways, and takes a step towards characterizing how.
Strengths:
The topic and goals of this paper are very interesting. As the authors note, there is little work on memory consolidation over development, and as such this will be an important data point in helping us begin to understand these important differences. The sample size is great, particularly given this is an onerous, multi-day experiment; the authors are to be commended for that. The task design is also generally well controlled, for example as the authors include new recently learned pairs during each session.
Weaknesses:
As noted above and in my review of the original submission, the pattern similarity analysis for both item and category-level reinstatement were performed in a way that is not interpretable given concerns about temporal autocorrelation within scanning run. Unfortunately these issues remain of concern in this revision because they were not rectified. Most of my review focuses on this analytic issue, though I also outline additional concerns.
(1) The pattern similarity analyses are largely uninterpretable due to how they were performed.
(a) First, the scene-specific reinstatement index: The authors have correlated a neural pattern during a fixation cross (delay period) with a neural pattern associated with viewing a scene as their measure of reinstatement. The main issue with this is that these events always occurred back-to-back in time. As such, the two patterns will be similar due simply to the temporal autocorrelation in the BOLD signal. Because of the issues with temporal autocorrelation within scanning run, it is always recommended to perform such correlations only across different runs. In this case, the authors always correlated patterns extracted from the same run, and which moreover have temporal lags that are perfectly confounded with their comparison of interest (i.e., from Fig 4A, the "scene-specific" comparisons will always be back-to-back, having a very short temporal lag; "set-based" comparisons will be dispersed across the run, and therefore have a much higher lag). The authors' within-run correlation approach also yields correlation values that are extremely high - much higher than would be expected if this analysis was done appropriately. The way to fix this would be to restrict the analysis to only cross-run comparisons, which is not possible given the design.
To remedy this, in the revision the authors have said they will refrain from making conclusions about the presence of scene-specific reinstatement (i.e., reinstatement above baseline). While this itself is an improvement from the original manuscript, I still have several concerns. First, this was not done thoroughly and at times conclusions/interpretations still seem to imply or assume the presence of scene reinstatement (e.g., line 979-985, "our research supports the presence of scene-specific reinstatement in 5-to-7-year-old children"; line 1138). Second, the authors' logic for the neural-behavioural correlations in the PLSC analysis involved restricting to regions that showed significant reinstatement for the gist analysis, which cannot be done for the analogous scene-specific reinstatement analysis. This makes it challenging to directly compare these two analyses since one was restricted to a small subset of regions and only children (gist), while scene reinstatement included both groups and all ROIs. Third, it is also unclear whether children and adults' values should be directly comparable given pattern similarity can be influenced by many factors like motion, among other things.
My fourth concern with this analysis relates to the lack of regional specificity of the effects. All ROIs tested showed a virtually identical pattern: "Scene-specific reinstatement" decreased across delays, and was greater in children than adults. I believe control analyses are needed to ensure artifacts are not driving these effects. This would greatly strengthen the authors' ability to draw conclusions from the "clean" comparison of day 1 vs. day 14. (A) The authors should present results from a control ROI that should absolutely not show memory reinstatement effects (e.g., white matter?). Results from the control ROI should look very different - should not differ between children and adults, and should not show decreases over time. (B) Do the recent items from day 1 vs. day 14 differ? If so, this could suggest something is different about the later scans (and if not, it would be reassuring). (C) If the same analysis was performed comparing the object cue and immediately following fixation (rather than the fixation and the immediately following scene), the results should look very different. I would argue that this should not be an index of reinstatement at all since it involves something presented visually rather than something reinstated (i.e., the scene picture is not included in this comparison). If this control analysis were to show the same effects as the primary analysis, this would be further evidence that this analysis is uninterpretable and hopelessly confounded.
(b) For the category-based neural reinstatement: (1) This suffers from the same issue of correlations being performed within run. Again, to correct this the authors would need to restrict comparisons to only across runs (i.e., patterns from run 1 correlated with patterns for run 2 and so on). The authors in their response letter have indicated that because the patterns being correlated are not derived from events in close temporal proximity, they should not suffer from the issue of temporal autocorrelation. This is simply not true. For example, see the paper by Prince et al. (eLife 2022; on GLMsingle). This is not the main point of Prince et al.'s paper, but it includes a nice figure that shows that, using standard modelling approaches, the correlation between (same-run) patterns can be artificially elevated for lags as long as ~120 seconds (and can even be artificially reduced after that; Figure 5 from that paper) between events. This would affect many of the comparisons in the present paper. The cleanest way to proceed is to simply drop the within-run comparisons, which I believe the authors can do and yet they have not. Relatedly, in the response letter the authors say they are focusing mainly on the change over time for reinstatement at both levels including the gist-type reinstatement; however, this is not how it is discussed in the paper. They in fact are mainly relying on differences from zero, as children show some "above baseline" reinstatement while adults do not, but I believe there were no significant differences over time (i.e., the findings the authors said they would lean on primarily, as they are arguably the most comparable). (2) This analysis uses a different approach of comparing fixations to one another, rather than fixations to scenes. In their response letter and the revised paper, the authors do provide a bit of reasoning as to why this is the most sensible. However, it is still not clear to me whether this is really "reinstatement" which (in my mind) entails the re-evoking of a neural pattern initially engaged during perception. Rather, could this be a shared neural state that is category specific? In any case, I think additional information should be added to the text to clarify that this definition differs from others in the literature. The authors might also consider using some term other than reinstatement. Again (as I noted in my prior review), the finding of no category-level reinstatement in adults is surprising and confusing given prior work and likely has to do with the operationalization of "reinstatement" here. I was not quite sure about the explanation provided in the response letter, as category-level reinstatement is quite widespread in the brain for adults and is robust to differences in analytic procedures etc. (3) Also from a theoretical standpoint-I'm still a bit confused as to why gist-based reinstatement would involve reinstatement of the scene gist, rather than the object's location (on the screen) gist. Were the locations on the screen similar across scene backgrounds from the same category? It seems like a different way to define memory retrieval here would be to compare the neural patterns when cued to retrieve the same vs. similar (at the "gist" level) vs. different locations across object-scene pairs. This is somewhat related to a point from my review of the initial version of this manuscript, about how scene reinstatement is not necessary. The authors state that participants were instructed to reinstate the scene, but that does not mean they were actually doing it. The point that what is being measured via the reinstatement analyses is actually not necessary to perform the task should be discussed in more detail in the paper.
(2) Inspired by another reviewer's comment, it is unclear to me the extent to which age group differences can be attributed to differences in age/development versus memory strength. I liked the other reviewer's suggestions about how to identify and control for differences in memory strength, which I don't think the authors actually did in the revision. They instead showed evidence that memory strength does seem to be lower in children, which indicates this is an interpretive confound. For example, I liked the reviewer's suggestion of performing analyses on subsets of participants who were actually matched in initial learning/memory performance would have been very informative. As it is, the authors didn't really control for memory strength adequately in my opinion, and as such their conclusions about children vs. adults could have been reframed as people with weak vs. strong memories. This is obviously a big drawback given what the authors want to conclude. Relatedly, I'm not sure the DDM was incorporated as the reviewer was suggesting; at minimum I think the authors need to do more work in the paper to explain what this means and why it is relevant. (I understand putting it in the supplement rather than the main paper, but I still wanted to know more about what it added from an interpretive perspective.)
(3) Some of the univariate results reporting is a bit strange, as they are relying upon differences between retrieval of 1- vs. 14-day memories in terms of the recent vs. report difference, and yet don't report whether the regions are differently active for recent and remote retrieval. For example in Figure 3A, neither anterior nor posterior hippocampus seem to be differentially active for recent vs. remote memories for either age group (i.e., all data is around 0). Precuneus also interestingly seems to show numerically recent>remote (values mostly negative), whereas most other regions show the opposite. This difference from zero (in either direction) or lack thereof seems important to the message. In response to this comment on the original manuscript, the authors seem to have confirmed that hippocampal activity was greater during retrieval than implicit baseline. But this was not really my question - I was asking whether hippocampus is (and other ROIs in this same figure are) differently engaged for recent vs. remote memories.
(4) Related to point 3, the claims about hippocampus with respect to multiple trace theory feel very unsupported by the data. I believe the authors want to conclude that children's memory retrieval shows reliance on hippocampus irrespective of delay, presumably because this is a detailed memory task. However the authors have not really shown this; all they have shown is that hippocampal involvement (whatever it is) does not vary by delay. But we do not have compelling evidence that the hippocampus is involved in this task at all. That hippocampus is more active during retrieval than implicit baseline is a very low bar and does not necessarily indicate a role in memory retrieval. If the authors want to make this claim, more data are needed (e.g., showing that hippocampal activity during retrieval is higher when the upcoming memory retrieval is successful vs. unsuccessful). In the absence of this, I think all the claims about multiple trace theory supporting retrieval similarly across delays and that this is operational in children are inappropriate and should be removed.
(5) There are still not enough methodological details in the main paper to make sense of the results. Some of these problems were addressed in the revision but others remain. For example, a couple of things that were unclear: that initially learned locations were split, where half were tested again at day 1 and the other half at day 14; what specific criterion was used to determine to pick the 'well-learned' associations that were used for comparisons at different delay periods (object-scene pairs that participants remembered accurately in the last repetition of learning? Or across all of learning?).
(6) In still find the revised Introduction a bit unclear. I appreciated the added descriptions of different theories of consolidation, though the order of presented points is still a bit hard to follow. Some of the predictions I also find a bit confusing as laid out in the introduction. (1) As noted in the paper multiple trace theory predicts that hippocampal involvement will remain high provided memories retained are sufficiently high detail. The authors however also predict that children will rely more on gist (than detailed) memories than adults, which would seem to imply (combined with the MTT idea) that they should show reduced hippocampal involvement over time (while in adults, it should remain high). However, the authors' actual prediction is that hippocampus will show stable involvement over time in both kids and adults. I'm having a hard time reconciling these points. (2) With respect to the extraction of gist in children, I was confused by the link to Fuzzy Trace Theory given the children in the present study are a bit young to be showing the kind of gist extraction shown in the Brainerd & Reyna data. Would 5-7 year olds not be more likely to show reliance on verbatim traces under that framework? Also from a phrasing perspective, I was confused about whether gist-like information was something different from just gist in this sentence: "children may be more inclined to extract gist information at the expense of detailed or gist-like information." (p. 8) - is this a typo?
(7) For the PLSC, if I understand this correctly, the profiles were defined for showing associations with behaviour across age groups. (1) As such, is it not "double dipping" to then show that there is an association between brain profile and behaviour-must this not be true by definition? If I am mistaken, it might be helpful to clarify this in the paper. (2) In addition, I believe for the univariate and scene-specific reinstatement analyses these profiles were defined across both age groups. I assume this doesn't allow for separate definition of profiles across the two group (i.e., a kind of "interaction"). If this is the case, it makes sense that there would not be big age differences... the profiles were defined for showing an association across all subjects. If the authors wanted to identify distinct profiles in children and adults they may need to run another analysis. (3) Also, as for differences between short delay brain profile and long delay brain profile for the scene-specific reinstatement - there are 2 regions that become significant at long delay that were not significant at a short delay (PC, and CE). However, given there are ceiling effects in behaviour at the long but not short delay, it's unclear if this is a meaningful difference or just a difference in sensitivity. Is there a way to test whether the profiles are statistically different from one another? (4) As I mentioned above, it also was not ideal in my opinion that all regions were included for the scene-specific reinstatement due to the authors' inability to have an appropriate baseline and therefore define above-chance reinstatement. It makes these findings really challenging to compare with the gist reinstatement ones.
(8) I would encourage the authors to be specific about whether they are measuring/talking about memory representations versus reinstatement, unless they think these are the same thing (in which case some explanation as to why would be helpful). For example, especially under the Fuzzy Trace framework, couldn't someone maintain both verbatim and gist traces of a memory yet rely more on one when making a memory decision?
(9) With respect to the learning criteria - it is misleading to say that "children needed between two to four learning-retrieval cycles to reach the criterion of 83% correct responses" (p. 9). Four was the maximum, and looking at the Figure 1C data it appears as though there were at least a few children who did not meet the 83% minimum. I believe they were included in the analysis anyway? Please clarify. Was there any minimum imposed for inclusion?
(10) For the gist-like reinstatement PLSC analysis, results are really similar a short and long delays and yet some of the text seems to implying specificity to the long delay. One is a trend and one is significant (p. 31), but surely these two associations would not be statistically different from one another?
(11) As a general comment, I had a hard time tying all of the (many) results together. For example adults show more mature neocortical consolidation-related engagement, which the authors say is going to create more durable detailed memories, but under multiple trace theory we would generally think of neocortical representations as providing more schematic information. If the authors could try to make more connections across the different neural analyses, as well as tie the neural findings in more closely with the behaviour & back to the theoretical frameworks, that would be really helpful.
-
Reviewer #2 (Public Review):
Schommartz et al. present a manuscript characterizing neural signatures of reinstatement during cued retrieval of middle-aged children compared to adults. The authors utilize a paradigm where participants learn the spatial location of semantically related item-scene memoranda which they retrieve after short or long delays. The paradigm is especially strong as the authors include novel memoranda at each delayed time point to make comparisons across new and old learning. In brief, the authors find that children show more forgetting than adults, and adults show greater engagement of cortical networks after longer delays as well as stronger item-specific reinstatement. Interestingly, children show more category-based reinstatement, however, evidence supports that this marker may be maladaptive for retrieving episodic details. The question is extremely timely both given the boom in neurocognitive research on the neural development of memory, and the dearth of research on consolidation in this age group. Also, the results provide novel insights into why consolidation processes may be disrupted in children.
-
Author response:
The following is the authors’ response to the original reviews.
Reviewer #1 (Public Reviews):
Summary:
This paper by Schommartz and colleagues investigates the neural basis of memory reinstatement as a function of both how recently the memory was formed (recent, remote) and its development (children, young adults). The core question is whether memory consolidation processes as well as the specificity of memory reinstatement differ with development. A number of brain regions showed a greater activation difference for recent vs. remote memories at the long versus shorter delay specifically in adults (cerebellum, parahippocampal gyrus, LOC). A different set showed decreases in the same comparison, but only in children (precuneus, RSC). The authors also used neural pattern similarity analysis to characterize reinstatement, though I have substantive concerns about how this analysis was performed and as such will not summarize the results. Broadly, the behavioural and univariate findings are consistent with the idea that memory consolidation differs between children and adults in important ways, and takes a step towards characterizing how.
Strengths:
The topic and goals of this paper are very interesting. As the authors note, there is little work on memory consolidation over development, and as such this will be an important data point in helping us begin to understand these important differences. The sample size is great, particularly given this is an onerous, multi-day experiment; the authors are to be commended for that. The task design is also generally well controlled, for example as the authors include new recently learned pairs during each session.
Weaknesses:
As noted above, the pattern similarity analysis for both item and category-level reinstatement was performed in a way that is not interpretable given concerns about temporal autocorrelation within the scanning run. Below, I focus my review on this analytic issue, though I also outline additional concerns.
We thank the reviewer for both the positive and critical appraisal of our paper.
(1) The pattern similarity analyses were not done correctly, rendering the results uninterpretable (assuming my understanding of the authors' approach is correct).
a. First, the scene-specific reinstatement index: The authors have correlated a neural pattern during a fixation cross (delay period) with a neural pattern associated with viewing a scene as their measure of reinstatement. The main issue with this is that these events always occurred back-to-back in time. As such, the two patterns will be similar due simply to the temporal autocorrelation in the BOLD signal. Because of the issues with temporal autocorrelation within the scanning run, it is always recommended to perform such correlations only across different runs. In this case, the authors always correlated patterns extracted from the same run, which moreover have temporal lags that are perfectly confounded with their comparison of interest (i.e., from Fig 4A, the "scene-specific" comparisons will always be back-to-back, having a very short temporal lag; "set-based" comparisons will be dispersed across the run, and therefore have a much higher lag). The authors' within-run correlation approach also yields correlation values that are extremely high - much higher than would be expected if this analysis was done appropriately. The way to fix this would be to restrict the analysis to only cross-run comparisons, but I don't believe this is possible unfortunately given the authors' design; I believe the target (presumably reinstated) scene only appears once during scanning, so there is no separate neural pattern during the presentation of this picture that they can use. For these reasons, any evidence for "significant scene-specific reinstatement" and the like is completely uninterpretable and would need to be removed from the paper.
We thank the reviewer for this important input. We acknowledge that our study design leads to temporal autocorrelation in the BOLD signal when calculating RSA between fixation and scene time windows. We also recognize that we cannot interpret the significance of scene-specific reinstatement compared to zero and have accordingly removed this information. Nevertheless, our primary objective was to investigate changes in scene-specific reinstatement in relation to the different time delays of retrieval. Given that the retrieval procedure is the same over time and presumably similarly influenced by temporal autocorrelations, we argue that our results must be attributed to the relative differences in reinstatement across recent and remote trials. Bearing this in mind, we argue that our results can be interpreted in terms of delay-related changes in reinstatement. This information is discussed in pp. 21, 40 of the manuscript.
We agree with the reviewer that cross-run comparisons would be extremely interesting. This could be achieved by introducing the same items repeatedly across different runs, which was not possible in our current setup since we were interested in single exposure retrieval and practical time restriction in scanning children. We have introduced this idea in Limitations and Discussion sections (pp. 40, 44) of the manuscript to inform future studies.
Finally, thanks to the reviewer’s comment, we identified a bug in the final steps of our RSA calculation. Fischer’s z-transformation was incorrectly applied to r-1 values, resulting in abnormally high values. We apologize for this error. We have revised the scripts and rectified the bug by correctly applying Fischer’s z-transformation to the r similarity values. We also adjusted the methods description figure accordingly (Figure 5, p. 22). This adjustment led to slightly altered reinstatement indices. Nevertheless, the overall pattern of delay-related attenuation in the scene-specific reinstatement index, observed in both children and adults, remains consistent. Similarly, we observed gist-like reinstatement uniquely in children.
b. From a theoretical standpoint, I believe the way this analysis was performed considering the fixation and the immediately following scene also means that the differences between recent and remote could have to do with either the reactivation (processes happening during the fixation, presumably) or differences in the processing of the stimulus itself (happening during the scene presentation). For example, people might be more engaged with the more novel scenes (recent) and therefore process those scenes more; such a difference would be interpreted in this analysis as having to do with reinstatement, but in fact could be just related to the differential scene processing/recognition, etc.
Thank you for your insightful comments. We acknowledge the theoretical concerns raised about distinguishing between the effects of reactivation processes occurring during fixation and differential processing of the stimulus itself during scene presentation. Specifically, the notion that engagement levels with recent scenes could result in enhanced processing, which might be misattributed to memory reinstatement mechanisms.
We argue, however, that during scene presentation, scenes are processed more “memory-wise” rather than “perception-wise”, since both recent and remote memories are well-learned, as we included only correctly recalled memories in the analysis.
We concur that scene presentations entail perceptual processing; however, such processing would be consistent across all items, given that they were presented with the same repeated learning procedure, rendering them equally familiar to participants. In addition, we would argue that distinct activation patterns elicited during varying delays are more likely attributable to memory-related processing, since participants actively engaged in a memory-based decision-making task during these intervals. We have incorporated this rationale into the discussion section of our manuscript (p. 40).
With this in mind, we hypothesized that in case of “memory-wise” processing, the neural engagement during the scene time window should be higher for remote compared to recent items, and this increases with passing time as more control and effort should be exhibited during retrieval due to reorganized and distributed nature of memories. If the scenes are processed more “perception-wise”, we would expect higher neural engagement during the retrieval of recent compared to remote items. Our exploratory analysis (detailed overview in supplementary materials, Figure S3, Table S9) revealed a higher neural activation for remote compared to recent items in medial temporal, prefrontal, occipital and cerebellar brain regions, supporting the notion of “memory-wise” processes during scene time window. However, this exploratory analysis cannot provide a direct solution to the reviewer’s concern as our paradigm per se cannot arbitrate between “memory-wise” and “perception-wise” nature of retrieval. We added the point to the discussion (see p. 40).
c. For the category-based neural reinstatement:
(1) This suffers from the same issue of correlations being performed within the run. Again, to correct this the authors would need to restrict comparisons to only across runs (i.e., patterns from run 1 correlated with patterns for run 2 and so on). With this restriction, it may or may not be possible to perform this analysis, depending upon how the same-category scenes are distributed across runs. However, there are other issues with this analysis, as well.
(2) This analysis uses a different approach of comparing fixations to one another, rather than fixations to scenes. The authors do not motivate the reason for this switch. Please provide reasoning as to why fixation-fixation is more appropriate than fixation-scene similarity for category-level reinstatement, particularly given the opposite was used for item-level reinstatement. Even if the analyses were done properly, it would remain hard to compare them given this difference in approach.
(3) I believe the fixation cross with itself is included in the "within category" score Is this not a single neural pattern correlated with itself, which will yield maximal similarity (pearson r=1) or minimal dissimilarity (1-pearson r=0)? Including these comparisons in the averages for the within-category score will inflate the difference between the "within-category" and "between-category" comparisons. These (e.g., forest1-forest1) should not be included in the within-category comparisons considered; rather, they should be excluded, so the fixations are always different but sometimes the comparisons are two retrievals of the same scene type (forest1-forest2), and other times different scene types (forest1-field1)
(4) It is troubling that the results from the category reinstatement metric do not seem to conceptually align with past work; for example, a lot of work has shown category-level reinstatement in adults. Here the authors do not show any category-level reinstatement in adults (yet they do in children), which generally seems extremely unexpected given past work and I would guess has to do with the operationalization of the metric.
Thank you for this important input regarding category-based reinstatement.
(1) The distribution of within-category items across runs was approximately similar and balanced. Additionally, within runs, they were presented randomly without close temporal proximity. Based on this arrangement, we believe that the issue of close temporal autocorrelation, as pointed out by the reviewer in the context of scene-specific reinstatement, may not apply to the same extent here. Again, our focus is not on the absolute level of category-based reinstatement, but the relative difference across conditions (recent vs. remote short delay vs. remote long delay) which are equally impacted by the autocorrelations.
(2) We apologize for not motivating this analysis further. Whereas the scene-reinstatement index (i.e., fixation to scene correlation) gives us a measure of the pre-activation of a concrete scene (e.g., a yellow forest in autumn), the gist-like reinstatement gives us a measure of the pre-activation of a whole category of scenes (e.g., forests). Critically, our window of interest is the fixation period for both sets of analysis (in the absence of any significant visual input). The scene-specific reinstatement uses the scene window as a neural template against which the fixation period can be compared, while the gist-like reinstatement compares similarity of reactivation pattern for trials from the same category but differ in the exact memory content. The reinstatement of more generic, gist-like memory (e.g., forest) across multiple trials should yield more similar neural activation patterns. Significant gist-like reinstatement would suggest that neural patterns for scenes within the same category are more generic, as indicated by higher similarity among them. On the other hand, a more detailed reinstatement of specific types of forests (e.g., a yellow forest in autumn, green pine trees, a bare-leaved forest in spring, etc.) that differ in various dimensions could result in neural activation patterns that are as dissimilar as those seen in the reinstatement of scenes from entirely different categories. Through this methodology, we could distinguish between more generic, gist-like reinstatement and more specific, detailed reinstatement. This is now clarified in the manuscript, see p. 25.
(3) We apologize for the confusion caused by the figure and analysis description. In our analysis, we indeed excluded the correlation of the fixation cross with itself. Consequently, the diagonal in the figure should be blank to indicate this. This is now revised in the manuscript (Figure 7B and in Methods).
(4) We appreciate your concern and recognize that the terminology we used might not align perfectly with the conventional understanding of category-based reinstatement. Typically, category-level neural representations (as discussed in Polyn et al., 2005; Jafarpour et al., 2014; among others) are investigated to identify specific brain areas associated with encoding/perception of scenes or faces. Our aim, however, was to explore the mnemonic reinstatement of highly detailed scenes that were elaborately encoded, with the hypothesis that substantial representational transformations would occur over time and vary with age. This hypothesis is based on the memory literature, including the Fuzzy-Trace Theory, the Contextual Binding Theory, and the Trace Transformation Theory (Brainerd & Reyna, 1998; Yonelinas, 2019; Moscovitch & Gilboa, 2023). Therefore, we renamed 'category-based' reinstatement to 'gist-like' reinstatement, which clarifies our concept and better aligns it with existing literature.
We anticipated that young adults, having the ability to retain detailed narratives post-encoding, would demonstrate a reinstatement of scenes with distinct details, making these scenes dissimilar from each other (see similar findings in Sommer et al., 2021). In contrast, given the anticipated lesser strategic elaboration during learning in children, we hypothesized that they would demonstrate a shallower, more gist-like reinstatement (for instance, children recalling a forest or a field in a general sense without specific details or vivid imagery). This could result in higher category-based similarity, as children might reinstate a more generic forest concept.
We did not gather additional data on the verbal quality of reinstatement due to the limited scanning time available for children, so these assumptions remain unverified. However, anecdotal observations post-retrieval indicated that adults often reported very vivid scenes associated with clear narrative recall. In contrast, children frequently described more vague memories (e.g., “I know it was a forest”) without specific details. Future studies should include measures to assess the quality of reinstatement, potentially outside the scanning environment.
(2) I did not see any compelling statistical evidence for the claim of less robust consolidation in children.
Specifically in terms of the behavioral results of retention of the remote items at 1 vs 14 days, shown in Figure 2B, the authors conclude that memory consolidation is less robust in children (line 246). Yet they do not report statistical evidence for this point, as there was no interaction of this effect with the age group. Children had worse memory than adults overall (in terms of a main effect - i.e. across recent and remote items). If it were consolidation-specific, one would expect that the age differences are bigger for the remote items, and perhaps even most exaggerated for the 14-day-old memories. Yet this does not appear to be the case based on the data the authors report. Therefore, the behavioral differences in retention do not seem to be consolidation specific, and therefore might have more to do with differences in encoding fidelity or retrieval processes more generally across the groups. This should be considered when interpreting the findings.
Thank you for highlighting this important issue. We acknowledge that our initial description and depiction of our behavioral findings may not have effectively conveyed the main message about memory consolidation. Therefore, we have revised the behavioral results section (see pp. 12-14) to communicate our message more clearly.
As detailed in the methods section, we reported retention rates only for those items that were correctly (100%) learned on day 0, day 1, and day 14. This approach meant that different participants had varying numbers of items learned correctly. However, this strategy allowed us to address our primary question: whether memory consolidation, based on all items initially encoded successfully, is comparably robust between groups.
To illustrate the change in retention rate slopes over time for recently learned items (i.e., immediately 30 minutes after learning), short delay remote, and long delay remote items, relative to the initially correctly learned items more clearly and straightforward, we conducted the following analysis: after observing no differences between sessions in both age groups for recent items on days 1 and 14, we combined the recent items. This approach enabled us to investigate how the slope of memory retention for initially correctly learned items (with a baseline of 100%) changes over time. We observed a significant interaction between item type (recent, short delay remote, long delay remote) and group (F(3,250) = 17.35, p < .001, w2 = .16). The follow up of this interaction revealed significantly less robust memory consolidation across all delay times in children compared to young adults. This information is added in the manuscript in pp. 12-14. We have also updated the figures, incorporating the baseline of 100% correct performance.
(3) Please clarify which analyses were restricted to correct retrievals only. The univariate analyses states that correct and incorrect trials were modelled separately but does not say which were considered in the main contrast (I assume correct only?). The item specific reinstatement analysis states that only correct trials were considered, but the category-level reinstatement analysis does not say. Please include this detail.
Thank you for bringing this to our attention. We indeed limited our analysis – including univariate, specific reinstatement, and gist-like analyses – to only correctly remembered items. This decision was made because our goal was to observe delay-related changes in the neural correlates of correct memories, which are potentially stronger. We have incorporated this information into the manuscript.
(4) To what extent could performance differences be impacting the differences observed across age groups? I think (see prior comment) that the analyses were probably limited to correct trials, which is helpful, but still yields pretty big differences across groups in terms of the amount of data going into each analysis. In general, children showed more attenuated neural effects (e.g., recent/remote or session effects); could this be explained by their weaker memory? Specifically, if only correct trials are considered that means that fewer trials would be going into the analysis for kids, especially for the 14-day remote memories, and perhaps pushing the remove > recent difference for this condition towards 0. The authors might be able to address this analytically; for example, does the remote > recent difference in the univariate data at day 14 correlate with day 14 memory?
Thank you for pointing this out. Indeed, there was a significant relationship between remote > recent difference in the univariate data and memory performance at day 14 across both age group (see Figure 4C-D). The performance of all participants including children was above chance level for remote trial on day 14. In addition, although number of remote trials was lower in children (18 trials on average) in comparison to adults (22 trials on average), we believe that the number of remote trials was not too low or different across groups for the contrast.
(5) Some of the univariate results reporting is a bit strange, as they are relying upon differences between retrieval of 1- vs. 14-day memories in terms of the recent vs. report difference, and yet don't report whether the regions are differently active for recent and remote retrieval. For example, in Figure 3A, neither anterior nor posterior hippocampus seem to be differentially active for recent vs. remote memories for either age group (i.e., all data is around 0). This difference from zero or lack thereof seems important to the message - is that correct? If so, can the authors incorporate descriptions of these findings?
Thank you for this valuable input. When examining recent and remote retrieval separately, indeed both the anterior and posterior regions of the hippocampus exhibited significant activation from zero in adults (all p < .0003FDRcorr) and children (all p < .014FDRcorr, except for recent posterior hippocampus) during all delays. We include this information in the manuscript (see p. 17) and add it to the supplementary materials (Figure S2, Table S7).
(6) Please provide more details about the choices available for locations in the 3AFC task. (1) Were they different each time, or always the same? If they are always the same, could this be a motor or stimulus/response learning task? (2) Do the options in the 3AFC always come from the same area - in which case the participant is given a clue as to the gist of the location/memory? Or are they sometimes randomly scattered across the image (in which case gist memory, like at a delay, would be sufficient for picking the right option)? Please clarify these points and discuss the logic/impact of these choices on the interpretation of the results.Response: Thank you for pointing this out. During learning and retrieval, we employed the 3AFC (Three-Alternative Forced Choice) task.
The choices for locations varied across scenes while remained the same across time within individuals. There were 18 different key locations for the objects, distributed across the stimulus set. This means the locations of the objects were quite heterogeneous and differed between objects. The location of the object within the task was presented once during encoding and remained consistent throughout learning. Given the location heterogeneity, we believe our task cannot be reduced to a mere “stimulus/response learning task” but is more accurately described as an object-location associations task.
Similar to the previous description, the options for the 3AFC task did not originate from the same area, as there were 18 different areas in total. The three choice options were distributed equally: so sometimes the “correct” answer was the left option, sometimes in the middle option, or sometimes the right option. Therefore, we believe that the 3AFC task did not provide clues to the location but required detailed and precise memory of the location. Moreover, the options were not randomly scattered but rather presented close together in the scene, demanding a high level of differentiation between choices.
Taking all the above into consideration, we assert that precise object-location associative memory is necessary for a correct answer. We have added this information to the manuscript (p. 9).
(7) Often p values are provided but test statistics, effect sizes, etc. are not - please include this information. It is at times hard to tell whether the authors are reporting main effects, interactions, pairwise comparisons, etc.
Thank you for bringing this to our attention. We realize that including this information in the Tables may not be the most straightforward approach. Therefore, we have incorporated the test statistics, effect sizes, and related details into the text of the results section for clarity.
(8) There are not enough methodological details in the main paper to make sense of the results. For example, it is not clear from reading the text that there are new object-location pairs learned each day.
Thank you for pointing this out. We have added this information to the main manuscript. Additionally, we have emphasized this information in the text referring to Figure 1B.
(9) The retrieval task does not seem to require retrieval of the scene itself, and as such it would be helpful for the authors to both explain their reasoning for this task to measure reinstatement. Strictly speaking, participants could just remember the location of the object on the screen. Was it verified that children and adults were recalling the actual scene rather than just the location (e.g. via self-report)? It's possible that there may be developmental differences in the tendency to reinstate the scene depending on e.g., their strategy.
Thank you for highlighting this important point. Indeed, the retrieval task included explicit instructions for participants to recall and visualize the scene associated with the object presented during the fixation time window. Participants were also instructed to recollect the location of the object within the scene. Since the location was contextually bound to the scene and each object had a unique location in each scene, the location of the object was always embedded in the specific scene context. We have added this information to both the Methods and Results sections.
From the self-reports of the participants (which unfortunately were not systematically collected on all occasions), they indicated that when they could recall the scene and the location due to the memory of stories created during strategic encoding, it aided their memory for the scene and location immensely. We also concur with your observation that children and young adults may differ in their ability to reinstate scenes, depending on the success of their employed recall strategies. This task was conducted with an awareness of potential developmental differences in the ability to form complex contextual memories. Our elaborative learning procedure was designed to minimize these differences. It is important to note though we did not expect children to achieve performance levels fully comparable to adults. There may indeed be developmental differences in reinstatement, such as due to differences in knowledge availability and accessibility (Brod, Werkle-Bergner, & Shing, 2013). We think that these differences may underlie our findings of neural reinstatement. This is now discussed in p. 34-35, 39-43 of the manuscript.
(10) In general I found the Introduction a bit difficult to follow. Below are a few specific questions I had.
a. At points findings are presented but the broader picture or take-home point is not expressed directly. For example, lines 112-127, these findings can all be conceptualized within many theories of consolidation, and yet those overarching frameworks are not directly discussed (e.g., that memory traces go from being more reliant on the hippocampus to more on the neocortex). Making these connections directly would likely be helpful for many readers.
Thank you for bringing this to our attention. We have incorporated a summary of the general frameworks of memory consolidation into the introduction. This addition outlines how our summarized findings, particularly those related to memory consolidation for repeatedly learned information, align with these frameworks (see lines 126-138, 146-150).
b. Lines 143-153 - The comparison of the Tompary & Davachi (2017) paper with the Oedekoven et al. (2017) reads like the two analyses are directly comparable, but the authors were looking at different things. The Tompary paper is looking at organization (not reinstatement); while the Oedekoven et al. paper is measuring reinstatement (not organization). The authors should clarify how to reconcile these findings.
Thank you for highlighting this aspect. We have revised how we present the results from Tompary & Davachi (2017). This study examined memory reorganization for memories both with and without overlapping features, and it observed higher neural similarity for memories with overlapping features over time. The authors also explored item-specific reinstatement for recent and remote memories by assessing encoding-retrieval similarity. Since Oedekoven et al. (2017) utilized a similar approach, their results are comparable in terms of reinstatement. We have updated and expanded our manuscript to clarify the parallels between these studies (see lines 157-162).
c. Line 195-6: I was confused by the prediction of "stable involvement of HC over time" given the work reviewed in the Introduction that HC contribution to memory tends to decrease with consolidation. Please clarify or rephrase.
Drawing on the Contextual Binding Theory (Yonelinas et al., 2019), as well as the Multiple Trace Theory (Nadel et al., 2000) and supported for instance by evidence from Sekeres et al. (2018), we hypothesized that detailed contextual memories formed through repeated and strategic learning would strengthen the specificity of these memories, resulting in consistent hippocampal involvement for successfully recalled contextualized detailed memories. We have included additional explanatory information in the manuscript to clarify this hypothesis (see lines 217-219).
d. Lines 200-202: I was a bit confused about this prediction. Firstly, please clarify whether immediate reinstatement has been characterized in this way for kids versus adults. Secondly, don't adults retain gist more over long delays (with specific information getting lost), at least behaviourally? This prediction seems to go against that; please clarify.
Thank you for raising this important point. Indeed, there are no prior studies that examined memory reinstatement over extended durations in children. The primary existing evidence suggests that neural specificity or patterns of neural representations in children can be robustly observed, while neural selectivity or univariate activation in response to the same stimuli tends to mature later (i.e., Fandakova et al., 2019). Bearing this in mind and recognizing that such neural patterns can be observed in both children and adults, we hypothesized that adults may form stronger detailed contextual memories compared to children. By employing strategies such as creating stories, adults might more easily recall scenes without the need to resort to forming generic or gist-like memories (for example, 'a red fox was near the second left pine tree in a spring green forest'). This assumption aligns with the Fuzzy Trace Theory (Reyna & Brainerd, 1995), which posits that verbatim memories can be created without the extraction of a gist.
Conversely, we hypothesized that children, due to the ongoing maturation of associative and strategic memory components (as discussed in Shing et al., 2008 and 2010), which are dependent respectively on the hippocampus (HC) and the prefrontal cortex (PFC), would be less adept at creating, retaining, and extracting stories to aid their retrieval process. This could result in them remembering more generic integrated information, like the relationship between a fox and some generic image of a forest. We have added explanatory information to the manuscript to elucidate these points (see lines 225-230).
Reviewer #1 (Recommendations For The Authors):
(1) For Figure 3, I would highly recommend changing the aesthetics for the univariate data - at least on my screen they appear to be open boxes with solid vs. dashed lines, and as such look identical to the recent vs. remove distinction in Figure 2B. It also doesn't match the legend for me, which shows the age groups having purple vs. yellow coloring.
Thank you for this observation. We have adjusted Figure 2 (now Figure 3) (please refer to p. 14) accordingly, now utilizing purple and yellow colors to distinguish between the age groups.
(2) Lines 329-330, it is not true that "all" indices were significant from zero but this is only apparent if you read the next sentence. Please rephrase to clarify. e.g., "All ... indices with a few exceptions ... were significantly..."?
Based on the above suggestions and considering our primary focus on time-related changes in scene-specific reinstatement, we will refrain from further interpreting the relative expression of individual scene-specific indices against 0. Consequently, we have removed this information from our analysis.
(3) It is challenging to interpret some of the significance markers, such as those in Figure 3. For example what effects are being denoted by the asterisks and bars above vs. below the data on panel D? Please clarify and/or note in the legend.
We have included a note in the legend to clarify the meaning of all significance markers. In addition, we decided to state any significant main and interaction effects in the figure rather that to use significance markers.
(4) For Figures 2 and 3, only the meaning of error bars is described in the caption. It is not explained in the caption what the boxes, lines, and points denote. Please clarify.
Thank you for highlighting this. We have added explanations to the figure's annotation for clarity. Please note, that considering other review’s suggestions figure plots may have been adjusted or changed, resulting in adjustment of the explanations in the figure annotation.
(5) How were recent and remote interspersed relative to one another? The text says that each run had 10 recent and 10 remote pairs, presented in a "pseudo-random order" - not clear what that (pseudo) means in this case. Please clarify.
Thank you for raising this point. We provide this information in the Methods section “Materials and Procedure”: 'The jitters and the order of presentation for recent and remote items were determined using OptimizeXGUI (Spunt, 2016), following an exponential distribution (Dale, 1999). Ten unique recently learned pairs (from the same testing day) and ten unique remotely learned items (from Day 0) were distributed within each run (in total three runs) in the order as suggested by the software as the most optimal. There were three runs with unique sets of stimuli each resulting in thirty unique recent and thirty unique remote stimuli overall.'
(6) Figure 1A, second to last screen on the learning cycles row - what would be presented to participants here, one of these three emojis? What does the sleepy face represent? I see some of these points were mentioned in the methods, but additional clarification in the caption would be helpful.
Thank you for highlighting this. We have included this information in the figure caption. Specifically, the sleepy face symbol in the figure denotes a 'missed response'.
(7) Not clear how the jittered fixation time between object presentation and scene test is dealt with in representational similarity analyses.
Thank you for pointing this out. Beta estimates were obtained from a Least Square Separate (LSS) regression model. Each event was modeled with their respective onset and duration and, as such, one beta value was estimated per event (with the lags between events differing from trial to trial). We have edited the corresponding section (see p. 53).
(8) It was a little bit strange to have used anterior vs posterior HPC ROIs separately in univariate analysis but then combined them for multivariate. There are many empirical and theoretical motivations for looking at item-specific and category reinstatement in anterior and posterior HPC separately, so I was surprised not to see this. Please explain this reasoning.
Thank you for pointing this out. We agree with the reviewer and included the anterior and posterior HC ROIs into the multivariate analysis. Please see the revised results section (pp. 13-15).
(9) The term "neural specificity" is introduced (line 164) without explanation; please clarify.
Thank you for bringing this to our attention. The term ‘neural specificity’ refers to the neural representational distinctiveness of information. In other words, ‘neural specificity,’ as defined by Fandakova et al. (2019), refers to the distinctiveness of neural representations in the regions that process that sensory input. We decided, however to refrain from using this term and instead to use neural representational distinctiveness, which is more self-explaining and was also introduced in the manuscript.
(10) Age range is specified as 5-7 years initially (line 187) and then 6-7 years (line 188).
We have corrected the age range in line 188 to '5 to 7 years.'
Reviewer #2 (Public Reviews):
Schommartz et al. present a manuscript characterizing neural signatures of reinstatement during cued retrieval of middle-aged children compared to adults. The authors utilize a paradigm where participants learn the spatial location of semantically related item-scene memoranda which they retrieve after short or long delays. The paradigm is especially strong as the authors include novel memoranda at each delayed time point to make comparisons across new and old learning. In brief, the authors find that children show more forgetting than adults, and adults show greater engagement of cortical networks after longer delays as well as stronger item-specific reinstatement. Interestingly, children show more category-based reinstatement, however, evidence supports that this marker may be maladaptive for retrieving episodic details. The question is extremely timely both given the boom in neurocognitive research on the neural development of memory, and the dearth of research on consolidation in this age group. Also, the results provide novel insights into why consolidation processes may be disrupted in children. Despite these strengths, there are quite a few important design and analytical choices that derail my enthusiasm for the paper. If the authors could address these concerns, this manuscript would provide a solid foundation to better understand memory consolidation in children.
We thank the reviewer for both the positive and critical appraisal of our paper.
Reviewer #2 (Recommendations For The Authors):
(1) My greatest concern is the difference in memory accuracy that emerges as soon as immediate learning, which undermines the interpretation of any consolidation-related differences. This concern is two-fold. The authors utilize an adaptive learning approach in which participants learn to criteria or stop after 4 repetitions. This type of approach leads to children seeing the stimuli more often during learning compared to adults, which on its own could have consequences for consolidation-related neural markers. Specifically, within adults theoretical and empirical work this shows that repeating information can actually lead to more gist-like representations, which is the exact profile the children are showing. While there could be a strength to this approach because it allows for equivocal memory, the decision to stop repetitions before criteria means that memory performance is significantly lower in the children, which again could have consequences to consolidation-related neural markers. First, the authors do not show any of the learning-related data which would be critical to assess the impact of this design choice. Second, there are likely differences in memory strength at the delay, making it extremely difficult to determine if the neural markers reflect development, worse memory strength, or both. This issue is compounded by the use of a 3-AFC paradigm, wherein "correct responses" included in the analysis could contain a significant amount of guessing responses. I think a partial solution to this problem is to analyze the RT data and include them in the analyses or use a drift-diffusion modeling approach to get more precise estimates of memory strength to control for this feature. An alternative is to sub-select participants in each group to have a sample matched on performance (including # of repetitions) and re-run all the analyses in this sub-sample. Without addressing these concerns it is near impossible to interpret the presented data.
Thank you for highlighting this point.
Firstly, we believe that our approach, involving strategic and repeated learning coupled with feedback, enhances the formation of detailed contextual memories. The retrieval procedure also emphasized the need for detailed memory for location. These are critical differences in experimental procedure from previous studies, which enhanced the importance of detailed representations and likely reduced the likelihood of forming gist-like memories.
Indeed, we ceased further learning after the fourth repetition. Extensive piloting, where we initially stopped after the seventh repetition, showed no improvement beyond the fourth repetition. In fact, performance tended to decline due to fatigue. Therefore, we limited the number of repetition cycles to the point where an improvement of performance was still feasible. Even though children exhibited lower final learning performance overall, we believe our procedure facilitated them to reach their maximal performance within the experimental setup.
To address the reviewer’s concern, we included learning data to illustrate the progression of learning (see Fig. 1C, pp. 9-10 in Results).
When interpreting the retention rates, it is important to note that we reported retention rates only for items that were correctly learned (100%) on day 0, day 1, and day 14. This approach meant that different participants had varying numbers of items learned correctly. However, this method enabled us to address our primary question: whether memory consolidation, based on all items initially encoded successfully, is comparably robust between the groups. To simultaneously examine the change in retention rate slopes over time for recent (30 minutes after learning), short delay (one night after) remote, and long delay (two weeks after) remote items, we conducted a separate analysis of retention rates for recent items on days 1 and 14. After observing no differences between sessions in both age groups, we combined the data for recent items. This allowed us to investigate how the slope of memory retention for initially correctly learned items (with a baseline of 100%) changes over time. We observed a significant interaction between item type (recent, short delay remote, long delay remote) and group. Analysis of this interaction revealed significantly less robust memory consolidation across all delay times for children compared to young adults. The figures have been adjusted accordingly to incorporate the baseline of 100% correct performance.
Following your suggestion, we also employed the drift diffusion model approach to characterize memory strength, calculating drift rate, boundary and non-decision time parameters. We added the results to the Supplementary Materials (section S2.1, Figure S1).
Generally, our findings indicate lower overall drift rate in children when considering all items that had to be learned. We also observed that adults show higher slope of decline in drift rate in short and long delay, which, however, are characterized still by higher memory strength compared to children. Both age groups required similar amount of evidence to make decision, which declined with delay. It may indicate an adaptation of weaker memory. Further, we observed lesser non-decision time in children compared to adults, potentially suggesting less error checking or less thorough processing and memory access through strategy in children.
Overall, these results indicate weaker memory strength in children as a quantitative measure. It may nevertheless stem from qualitatively different memory representations that children form, as our RSA findings suggest. We believe that our neural effect reflects the effect of interest (i.e., worse memory due to lower memory strength in children). When controlled for, it will take away variance of interest in the neural data. Therefore, we will refrain from including memory strength into the model. However, we will include mean RT as the indicator of general response tendencies.
Given that the paper is already very complex and long, we opted to add the diffusion model results to the Supplementary Materials (section S2.1, Fig. S1), while discussing the results in the discussion (p. 35).
(2) More discussion of the behavioral task should be included in the results, in particular the nature of the adaptive learning paradigm including the behavioral results as well as the categorical nature of the memoranda. Without this information, it is difficult for the reader to understand what category-level versus item-level reinstatement reflects.
Thank you for this valuable input. We have incorporated this information into the results section. Please refer to pp. 9-10, 12, 14, 21, 25-26 for the added details.
(3) Some of the methods for the reinstatement analysis were unclear to me or warranted further adjustment. I believe the authors compared the scene against all other scenes. I believe it would be more appropriate to only compare this against scenes drawn from the same category as opposed to all scenes. Secondly, from my reading, it seems like the reinstatement was done during the scene presentation, rather than the object presentation in which they would retrieve the scene. I believe the reinstatement results would be much stronger if it was captured during the object presentation rather than the re-presentation of the scene. Or perhaps both sets of analyses should be included.
We apologize for the confusion regarding the analysis method.
During the review process we have improved the description of this analysis and hope it is easier to follow now. In short, we used both approaches (within and between categories) to suit different goals (I.e., measuring scene-reinstatement and gist-like reinstatement).
Both types of reinstatement were assessed during the fixation cross to avoid confounds with the object itself being on the screen. We only used the scene window in one analysis (scene-reinstatement index) as a neural template to track its pre-activation during the fixation. So, as the reviewer suggests, our rationale is that the reinstatement indeed starts taking place at the short object presentation window, but importantly, extends to the fixation window. We added this clarifying information to the results section (see p. 21-27).
(4) For the univariate results, it was unclear to me when reading the results whether they were focusing on the object presentation portion of the trial or the scene presentation portion of the trial. Again, I think the claims of reinstatement related activity would be stronger if they accounted for the object presentation period.
Thank you for pointing this out. Indeed, the univariate results were based on the object presentation time window. We added this information to the results section (Fig. 3, pp. 14, 16).
(5) Further, given the univariate differences shown across age groups, the authors should re-run all analyses for the RSA controlling for mean activation within the ROI.
Thank you for highlighting this. We re-ran all analysis for the RSA controlling for the mean activation within the ROI. The results remained unchanged. We have added this information to the results section as well as in Table S8 and S11 in the Supplementary Materials for further details.
(6) The authors should include explicit tests across groups for their brain-behavior analyses if they want to make any developmentally relevant interpretations of the data. Also, It would be helpful to include similar analyses to those using the univariate signals, and not just the RSA results.
Following reviewer’s suggestion, we included brain-behavior analyses for univariate data as well as RSA data with explicit tests across groups. These can be found in the Results Section pp. 18-20, 28-32. Due to the interdependence of predefined ROIs and to avoid running a high number of correlation tests, we employed the partial least square correlation analysis for this purpose. This approach focuses on multivariate links between specified Regions of Interest (ROIs) and fluctuations in memory performance over short and long delays across different age cohorts. We argue that this multivariate strategy offers a more comprehensive understanding of the relationships between brain metrics across various ROIs and memory performance, given their mutual dependence and connectivity (refer to Genon et al. (2022) for similar discussions).
(7) There could be dramatic differences in memory processing across 5-7 year olds. I know the sample is a little small for this, but I would like to see regressions done within the middle childhood group in addition to the across-group comparisons.
We have included information detailing the relationship between memory retention rate and age within the child group (refer to p. 13). In the child group, both recent and short delay remote memory improved with age. However, the retention rate for long-delayed memory did not show a significant improvement with increasing age in children.
(8) I am concerned that the authors used global-signal as a regressor in their first-level analyses, given that there could be large changes in the amount of univariate activation that occurs across groups. This approach can lead to false positives and negatives that obscure localized differences. The authors should remove this term, and perhaps use the mean sum of the white matter or CSF to achieve the noise regressor they wanted to include.
We understand the reviewers' concerns. However, we believe that our approach is recommended for the pediatric population. Specifically, Graff et al., 2021, found that global signal regression is a highly efficacious denoising technique in their study of 4 to 8-year-old children. This technique was previously suggested for adults by Ciric et al., 2017, and the benefits in terms of motion and physiological noise removal outweigh the potential costs of removing some signal of interest, as indicated by Behzadi et al., 2007. Additionally, we incorporated the six anatomic component-based noise correction (CompCor) to account for WM and CSF signals, as recommended in the pediatric literature.
(9) The authors discuss the relationship between hippocampal reactivation and worse memory through the lens of Schapiro et al., but a new paper by Tanriverdi et al came out in JOCN recently that is more similar to the authors' findings.
Thank you for highlighting the recent paper by Tanriverdi et al. in JOCN, which aligns closely with our findings. We appreciate the suggestion and agree that exploring this alignment could further enrich our discussion on the relationship between hippocampal reactivation and memory retention. We incorporated this work in our revised manuscript .
Minor Comments
- I was surprised that the authors did not see any differences in univariate signals for memory retrieval as a function of development, as much of the prior work has shown differences (for example work by Tracy Riggins). I believe this contrast should be highlighted in the discussion.
- Given the robust differences in sleep patterns across childhood and the role of sleep in systems consolidation framework, I think this feature should be highlighted in either the introduction or discussion.
- Could the authors report on differences (or lack of differences) in head motion across the groups, and if they are different whether they could include them as a confounding variable.
I believe we included six motion parameters and their derivatives into the model
Thank you for your comments.
First, prior works on univariate signals of memory retrieval focused mostly on remembered vs forgotten contrasts, while in our study we focused on remote vs recent in short and long delay only for correctly remembered items. This can partially explain the results. We highlighted this information in the discussion session.
Second, we agree with the reviewer that sleep patterns across childhood should be addressed in the analysis. Therefore, we incorporated them in the discussion section.
Third, indeed head motion were included in the analysis as confounding variables, as adding them is highly recommended for the developmental population (e.g., Graff et al. 2021). As an example, we observed higher framewise displacement in children compared to adults, t = -16(218), p <. 001, as well as in translational y, t = -2.33(288), p = .02.
Reviewer #3 (Public Reviews):
Summary:
This study aimed to understand the neural correlates of memory recall over short (1-day) and long (14-days) intervals in children (5-7 years old) relative to young adults. The results show that children recall less than young adults and that this is accompanied by less activation (relative to young adults) in brain networks associated with memory retrieval.
Strengths:
This paper is one of few investigating long-term memory (multiple days) in a developmental population, an important gap in the field. Also, the authors apply a representational similarity analysis to understand how specific memories evolve over time. This analysis shows how the specificity of memories decreases over time in children relative to adults. This is an interesting finding.
We thank the reviewer for the appraisal of our manuscript.
Weaknesses:
Overall, these results are consistent with what we already know: recall is worse in children relative to adults (e.g., Cycowicz et al., 2001) and children activate memory retrieval networks to a lesser extent than adults (Bauer et al, 2017).
It seems that the reduced activation in memory recall networks is likely associated with less depth of memory encoding in children due to inattentiveness, reduced motivation, and documented differences in memory strategies. In regard to this, there was consideration of IQ, sex, and handedness but these were not included as covariates as they were not significant although I note p<.16 suggests there was some level of association nonetheless. Also, IQ is measured differently for the children and adults so it's not clear these can be directly contrasted. The authors suggest the instructed elaborative encoding strategy is effective for children and adults but the reference in support of this (Craik & Tulving, 1975) does not seem to support this point.
Thank you for your review, and we appreciate your valuable feedback. Here are our responses and clarifications:
Regarding the novelty of the results in terms of mentioned existent literature, we believe that in contrast to Cycowicz et al. (2001) and Bauer et al (2017), etc, we assess not only immediate memory after encoding with semantic judgement of abstract associations, but add to these findings investigating consolidation-related changes in complex associative and contextual information in much under investigated sample of 5-to-7-year-old preschoolers. With this we are able to infer also how neural representations of children change over time, providing invaluable insights into knowledge formation in this developmental cohort.
With this, the observed age differences are not so of primary importance, as time-related changes in mnemonic representations observed in children.
Regarding the assumption of inattentiveness in children, we want to emphasize that the experimenter was present throughout the learning process, closely supervising the children. We observed prompt responses to every trial in children and noted an increase in accuracy over the encoding-learning cycles, leading us to conclude that the children were indeed attentive to the task. The observed accuracy improvement across learning cycles indicates increase in remembered information. Furthermore, we took measures to ensure their engagement, including extensive training in both verbal and computerized versions to ensure that they understood and actively created stories to support their learning.
We collected motivation data after each task execution in children, and the results indicated that they scored high in motivation. Children not only completed the tasks but also expressed their willingness to participate in subsequent appointments, highlighting their active involvement in the study.
The observed differences in the efficiency of strategy utilization were expected, given developmental differences in the associative and strategic components of memory in children, as noted in prior research (Shing, 2008, 2010).
We appreciate your point about IQ, sex, and handedness. These variables were indeed included in the behavioral models, and mean brain activation was also included in the brain data models, addressing the potential influence of these factors on our results.
While it's true that we applied different tests to measure IQ in children and adults, these tests targeted comparable subtests that addressed similar cognitive constructs. As the final IQ values are standardized, we believe it is appropriate to compare them between the two groups.
Lastly, we agree that the citation Craik & Tulving, 1975 supports the notion of effectiveness of instructed elaborative learning only in adults, but not in children. For this purpose, we added relevant literature for the child cohort (i.e., Pressley, 1982; Pressley et al., 1981; Shing et al., 2008).
Reviewer #3 (Recommendations For The Authors):
An additional point for the authors to consider is that the hypotheses were uncertain. The first is that prefrontal, parietal, cerebellar, occipital, and PHG brain regions would have greater activation over time in adults and not children - which is very imprecise as this is basically the whole brain. Moreover, brain imaging data may be in opposition to this prediction: e.g., the hippocampus has a delayed maturational pattern beyond 5-yrs (e.ge., Canada 2019; Uematsu 2012) and some cortical data predicts earlier development in these regions.
Thank you for your feedback, and we appreciate your insights regarding our hypotheses.
The selection of our regions of interest (ROIs) was guided by prior literature that has demonstrated the interactive involvement of multiple brain areas in memory retrieval and consolidation processes. Additionally, our recent work utilizing multivariate partial least square correlation analysis (Schommartz, 2022, Developmental Cognitive Neuroscience) has indicated that unique profiles derived from the structural integrity of multiple brain regions are differentially related to short and long-delay memory consolidation.
Indeed, the literature suggests that the hippocampus may exhibit a more delayed maturational pattern extending into adolescence, as supported by studies such as Canada (2019) and Uematsu (2012), etc. We added this information as well as findings from the literature on cortical development to be more balanced in our review of the literature.
Given this complexity, we believe it is important to emphasize in our discussion that both the medial temporal lobe, including the hippocampus, and cortical structures, as well as the cerebellum, undergo profound neural maturation. We highlight these nuances in our revised manuscript to provide a more comprehensive perspective on the developmental differences in memory retention over time.
The writing was challenging to follow - consider as an example on page 9 the sentence that spans 10 lines of text.
Thank you for bringing this to our attention. We have carefully reviewed the manuscript and have made efforts to streamline the text, ensuring that sentences are not overly long or complex to improve readability and comprehension.
I found the analysis (and accompanying figures) a bit of a data mine - there are so many results that are hard to digest and in other cases highly redundant one from the other. This may be resolved in part by moving redundant findings to the supplemental. Some were hard to follow - so when there is a line between recent and recent data, that seems confusing to connect data that, I believe, are different sets of items. Later scatterplots (Fig 7) have pale yellow dots that I had a hard time seeing.
Thank you for bringing up your concerns regarding the analysis and figures in our manuscript. We have carefully considered your feedback and made several improvements to address these issues.
To alleviate the challenge of digesting numerous results, we have taken steps to enhance clarity and reduce redundancy. Specifically, we have moved some of the redundant findings to the supplementary sections, which should help streamline the main manuscript and make it more reader friendly.
Regarding the line between 'recent' and 'recent data,' figure were transformed to a clearer version. Furthermore, we have improved the visibility of certain elements, such as the pale-yellow dots in the scatterplots (Fig 1, 2, 4, etc. ), to ensure that readers can better discern the data points.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
The authors show that short bouts of chemical ischemia lead to presynaptic changes in glutamate release and long-term potentiation, whereas longer bouts of chemical ischemia lead to synaptic failure and presumably cell death. This convincing work relies on rigorous electrophysiology/imaging experiments and data analysis. It is important as it provides new mechanistic details on chemical ischemia, which could offer potential insights into ischemic stroke in vivo.
-
Reviewer #1 (Public Review):
Summary:
This work by Passlick and colleagues set out to reveal the mechanism by which short bouts of ischemia perturb glutamate signalling. This manuscript builds upon previous work in the field that reported a paradoxical increase in synaptic transmission following acute, transient ischemia termed ischemic or anoxic long-term potentiation. Despite these observations how this occurs and the involvement of glutamate release and uptake mechanisms remained unanswered.
Here the authors employed two distinct chemical ischemia models, one lasting 2-minutes, the other 5-minutes. Recording evoked field excitatory postsynaptic potentials in acute brain slices, the authors revealed that shorter bouts of ischemia resulted in a transient decrease in postsynaptic responses followed by an overshoot and long-term potentiation. Longer bouts of chemical ischemia (5-minutes), however, resulted in synaptic failure that did not return to baseline levels over 50-minutes of recording (Figure 1).
Two-photon Imaging of fluorescent glutamate sensor iGluSnFR expressed in astrocytes matched postsynaptic responses with shorter ischemia resulting in a transient dip before increase in extracellular glutamate which was not the case with prolonged ischemia (Figure 2).
Mechanistically, the authors show that this increased glutamate levels and postsynaptic responses were not due to changes in glutamate clearance (Figure 3). Next using a competitive antagonist for postsynaptic AMPA receptors the authors show that synaptic glutamate release was enhanced by 2-minute chemical ischemia.
Taken together, these data reveal the underlying mechanism regarding ischemic long-term potentiation, highlighting presynaptic release as the primary culprit. Additionally, the authors show relative insensitivity of glutamate uptake mechanisms during ischemia, highlighting the resilience of astrocytes to this metabolic challenge.
-
Reviewer #2 (Public Review):
Summary:
To investigate the impact of chemical ischemia induced by blocking mitochondrial function and glycolysis, the authors measured extracellular field potentials, performed whole-cell patch-clamp recordings, and measured glutamate release with optical techniques. They found that shorter two-minutes-lasting blockade of energy production initially blocked synaptic transmission but subsequently caused a potentiation of synaptic transmission due to increased glutamate release. In contrast, longer five-minutes-lasting blockage of energy production caused a sustained decrease of synaptic transmission. A correlation between the increase of extracellular potassium concentration and the response upon chemical ischemia indicates that the severity of the ischemia determines whether synapses potentiate or depress upon chemical ischemia. A subsequent mechanistic analysis revealed that the speed of uptake of glutamate is unchanged. An increase in the duration of the fiber volley reflecting the extracellular voltage of the action potentials of the axon bundle was interpreted as an action potential broadening, which could provide mechanistic explanation. In summary, the data convincingly demonstrate that synaptic potentiation induced by chemical ischemia is caused by increased glutamate release.
Strengths:
The manuscript is well written, and the experiments are carefully designed. The results are exciting, novel, and important for the field. The main strength of the manuscript is the combination of electrophysiological recordings and optical glutamate imaging. The main conclusion of increased glutamate release was furthermore supported with an independent approach relying on a low-affinity competitive antagonist of glutamate receptors. The data are of exceptional quality. Several important controls were carefully performed, such as the stability of the recordings and the size of the extracellular space. The number of experiments are sufficient for the conclusions. The careful data analysis justifies the classification of two types of responses, namely synaptic potentiation and depression after chemical ischemia. The data are carefully discussed and the conclusions are justified.
Weaknesses:
The weaknesses are minor. The authors measured the fiber volley, which reflects the extracellular voltage of the compound action potential of the fiber bundle. The half-duration of the fiber volley was increased. These results are consistent with action potential broadening in the axons but the action potential broadening was not experimentally demonstrated. However, these results are carefully discussed.
-
Reviewer #3 (Public Review):
Summary:
This valuable study shows that shorter episodes (2min duration) of energy depletion, as it occurs in ischemia, could lead to long lasting dysregulation of synaptic transmission with presynaptic alterations of glutamate release at the CA3-CA1 synapses. A longer duration of chemical ischemia (5 min) permanently suppresses synaptic transmission. By using electrophysiological approaches, including field and patch clamp recordings, combined to imaging studies, the authors demonstrated that 2 min of chemical ischemia leads to a prolonged potentiation of synaptic activity with a long lasting increase of glutamate release from presynaptic terminals. This was observed as an increase in iGluSnFR fluorescence, a sensor for glutamate expressed selectively on hippocampal astrocytes by viral injection. The increase in iGluSnFR fluorescence upon 2 min chemical ischemia could not be ascribed to an altered glutamate uptake, which is unaffected by both 2 min and 5 min chemical ischemia. The presynaptic increase in glutamate release upon short episodes of chemical ischemia is confirmed by a reduced inhibitory effect of the competitive antagonist gamma-D-glutamylglycine on AMPA receptor mediated postsynaptic responses. Fiber volley durations in field recording are prolonged in slices exposed to 2 min chemical ischemia. The authors interpret this data as an indication that the increase in glutamate release could be ascribed to a prolongation of the presynaptic action potential possibly due to inactivation of voltage-dependent K+ channels. However, more direct evidence are needed to fully support this hypothesis. This research highlights an important mechanism by which altered ionic homeostasis underlying metabolic failure can impact on neuronal activity. Moreover, it also showed a different vulnerability of mechanisms involved in glutamatergic transmission with a marked resilience of glutamate uptake to chemical ischemia.
Strengths:
(1) The authors use a variety of experimental techniques ranging from electrophysiology to imaging to study the contribution of several mechanisms underlying the effect of chemical ischemia on synaptic transmission.<br /> (2) The experiments are appropriately designed and clearly described in the figures and in the text.<br /> (3) The controls are appropriate
Weaknesses:<br /> - The results are obtained in an ex-vivo preparation
Impact:
This study provides a more comprehensive view of the long term effects of energy depletion during short episodes of experimental ischemia leading to the notion that not only post-synaptic changes, as reported by others, but also presynaptic changes are responsible for long-lasting modification of synaptic transmission. Interestingly, the direction of synaptic changes is bidirectional and dependent on the duration of chemical ischemia, indicating that different mechanisms involved in synaptic transmission are differently affected by energy depletion.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public Review):
[…]
Weaknesses:
The question of the physiological relevance of short bouts of ischemia remains.
The chemical ischemia protocol induces a duration-dependent ATP depletion in acute slices on a time scale of minutes (Pape and Rose 2023). This is about the same time scale as the peri-infarct depolarisation (Lauritzen et al. 2011) that the protocol attempts to model. Of course, such models do not completely replicate the complex situation in vivo. However, the presented analyses of synapse function cannot be performed in vivo. We discuss this now in the manuscript.
The precise mechanisms underlying the shift between ischemia-induced long-term potentiation and long-term failure of synaptic responses were not addressed. Could this be cell death?
Thank you for the comment. Yes, we indeed believe that the persistent failure of synaptic transmission is because of neuronal cell death (i.e., of CA1 pyramidal cells) or at least persistent depolarisation. We did not explicitly state that in the original submission but do so in the revised manuscript. It is supported by the unquantified observation of swelling and/or loss of integrity of CA1 pyramidal cell bodies in parallel to postsynaptic failure. It is also in line with many reports from the literature, of which we now cite two (lines 186-198).
Sex differences are not addressed or considered.
We have performed all experiments on male mice, as indicated in Material and Methods. We have indeed not addressed sex differences of the observed effects. We consider this, and many other important factors, to be interesting topics for follow-up studies. This is now discussed (lines 413-424).
Reviewer #2 (Public Review):
[…]
Weaknesses:
The weaknesses are minor and only relate to the interpretation of some of the data regarding the presynaptic mechanisms causing the potentiation of release. The authors measured the fiber volley, which reflects the extracellular voltage of the compound action potential of the fiber bundle. The half-duration of the fiber volley was increased, which could be due to the action potential broadening of the individual axons but could also be due to differences in conduction velocity. We are therefore skeptical whether the conclusion of action broadening is justified.
These are excellent points. We have added an analysis demonstrating that axonal conduction velocity is unlikely to be affected. Nonetheless, the fiber volley is indeed an indirect measure of what happens in individual axons. We have adjusted our interpretation accordingly and now also discuss alternative explanations of our findings (lines 363-379).
Reviewer #3 (Public Review):
[…]
Weaknesses:
The data on fiber volley duration should be supported by more direct measurements to prove that chemical ischemia increases presynaptic Ca2+ influx due to a presynaptic broadening of action potentials. Given the influence that positioning of the stimulating and recording electrode can have on the fiber volley properties, I found this data insufficient to support the assumption of a relationship between increased iGluSnFR fluorescence, action potential broadening, and increased presynaptic Ca2+ levels.
We have added a new analysis showing that the latency of the fiber volley is unaffected and relatively constant, which strengthens our conclusion. But the fiber volley is indeed an indirect measure of action potential firing in individual axons. The suggested experiment, which would require simultaneous recording of Ca2+ and action potentials in single axons in combination with chemical ischemia, is extremely difficult, if possible at all. Instead, we have extended the discussion and include now further alternative mechanistic explanations (lines 363-379).
The results are obtained in an ex-vivo preparation, it would be interesting to assess if they could be replicated in vivo models of cerebral ischemia.
This would certainly be very interesting but also extremely challenging technically. For a detailed analysis of synaptic changes as presented here, the main difficulty will be to stimulate and visualise glutamate release exclusively in an isolated population of synapses while recording postsynaptic responses in a stroke model.
Recommendations For The Authors:
Reviewer #1 (Recommendations For The Authors):
[…]
Labelling of experimental groups of 2-minute and 5-minute chemical ischemia is more accurate than "metabolic stress" and "with postsynaptic failure". The critical difference between these two conditions is lost with this nomenclature. The reader could be misled to believe that the two groups form a heterogenous population of responses from the same experimental manipulation which is incorrect.
We had stated in the manuscript that we ‘ … grouped combined iGluSnFR and electrophysiological recordings according to the effect of chemical ischemia on the synaptic response: ‘chemical ischemia with postsynaptic failure’ if the postsynaptic response did not recover to above 50% of the baseline level and ‘chemical ischemia’ when it did (as indicated in Fig. 1H). …’. The recordings were not grouped according to chemical stress duration but according to the effect on the postsynaptic response. We have revised the text explaining this (lines 125-135) and illustrate that now also in Fig. 1H. We hope this is easier to follow now.
More details on the long-term impact of 5-minute ischemia on cell viability would be enlightening regarding the specific mechanism separating these two conditions. With 2 minutes it would appear that cells remain alive (i.e. intact post-synaptic responses), 5 minutes however, inducing cell death.
Yes, our observations, although not quantified, are in line with cell death as CA1 pyramidal cell bodies appeared swollen and/or lost their integrity when chemical ischemia was followed by postsynaptic failure. This is also in line with reports from the literature. We have revised the results section accordingly (lines 186-201).
In the paragraph titled "glutamate uptake is unaffected after acute chemical ischemia", there are two erroneous citations of Figure S3 that should be Figure S4.
Thank you. We corrected this mistake.
The sex of animals is not given. This is essential information.
We used male mice as indicated in the initial version of the manuscript (Material and Methods). We have added a statement regarding the role of sex to the final section of the Discussion.
Reviewer #2 (Recommendations For The Authors):
We propose addressing the weaknesses mentioned in the public review. As said, the fibre volley is a very indirect measure of action potential broadening. Based on the iGluSnFR data, the authors predict that the potentiation is mediated by depolarization, action potential broadening, and increased presynaptic calcium influx. The latter could be tested experimentally, but this does not seem necessary if the data are interpreted more cautiously. For example, other explanations for the broadened fiber volley could be mentioned, such as a slowing and/or dispersion of the action potential propagation speed. Furthermore, depolarization could cause elevated resting calcium concentrations, which could potentiate release independently of action potential broadening. Finally, classical forms of presynaptic potentiation of the release machinery that occur during homeostatic plasticity or Hebbian plasticity may operate independently of calcium dynamics.
Thank you for this comment. The discussion of the mechanism was indeed too short. We have added an analysis of the fiber volley delay after stimulation, which was not affected. Presynaptic action potential broadening is, in our opinion, a very likely explanation for our observations but we did not perform direct experiments. Directly recording presynaptic action potentials and Ca2+ transients in the chemical ischemia model over extended periods of time is a major technical challenge and certainly of interest in the future. As suggested, we have expanded the discussion section and now mention various alternative explanations (lines 363-379).
There are the following minor suggestions:
Add line numbers.
We have added line numbers.
We would suggest providing exact P values instead of asterisks in the figures.
We agree that having exact P values in the figure panels can be very helpful. However, in the present figures they are hard to integrate without overcrowding the already complex panels and thereby obscuring other important details. All p-values are included in the figure legends and/or main text.
Abstract: "We also observed an unexpected hierarchy of vulnerability of the involved mechanisms and cell types." This sentence is hard to understand and cell types were not directly compared (i.e. axons of CA3 and axons of CA1 neurons were not compared).
We have revised this statement and removed the reference to cell types.
In Figure 1G there seems to be an increase in the fiber volley. Is this significant? Could this be due to swelling of the slice during chemical ischemia? Or an increase in excitability? Maybe this could be discussed.
The effect was analysed in the context of Fig. 2. A significant increase of the fiber volley amplitude was detected in chemical ischemia (Fig. 2H) but also under control conditions (Fig. 2F). We therefore consider this a change that is detectable but not related to chemical ischemia and not a potential explanation for increased glutamate release (lines 157-160). Also, no significant fiber volley increase was detected in chemical ischemia with postsynaptic failure (Fig. 2H) and in the experiments illustrated in Fig. 4E. Our interpretation is that the fiber volley unspecifically increases in some experiments over the time course of the experiment (~ 60 min) but this is unrelated to chemical ischemia.
In the results: "A fully separate set of experiments..." Please explain better what this means.
We have revised the entire section to explain more clearly how recordings were grouped (lines 125135).
In the results: "...(Syková and Nicholson, 2008) (Figure S3). However, this was not observed for chemical ischemia without postsynaptic failure (Figure S3), in which the increased glutamate transients were observed." This should probably refer to Figure S4.
Thank you for spotting this mistake. We corrected it.
The last sentence in the results "... most likely by increased presynaptic Ca2+ influx, and, at the same time, the postsynaptic response." This is difficult to understand. Does "at the same time" refer to another mechanism or the consequence of more Ca2+?
We revised this part of the results section to improve clarity and toned down our conclusions (lines 328-335 and 363-379).
Reviewer #3 (Recommendations For The Authors):
There are a few points that the author needs to clarify:
The authors do not discuss the different behaviour of iGlu F0 during chemical ischemia and chemical ischemia with postsynaptic failure shown in Figure 2, panels D and E. In the first case, during the application of the solution to induce ischemia, iGluF0 decreases while in the other case, it strongly increases before falling down. In both cases, the fEPSP slope is decreased. How does the author explain this observation?
We attribute the transient increase of extracellular glutamate during prolonged chemical ischemia to the increase of synaptic glutamate release observed previously under such conditions (Hershkowitz et al. 1993; Tanaka et al. 1997) and other mechanisms reviewed by us (Passlick et al. 2021) (e.g., glial glutamate release, transiently reduced glutamate uptake), which we could not detect during shorter chemical ischemia. The initial drop of the fEPSP slope is most likely due to postsynaptic depolarisation, which is followed by a repolarisation if the chemical stress duration is short. We now explain this in more detail in lines 185-200 of the revised manuscript. Although we focussed on the bi-directional effect on longer timescales in this manuscript, this transient phase during chemical ischemia is very interesting for further investigations.
On page 8, first line, I think that the authors meant Figure S4, not Figure S3 when they mentioned results on ECS diffusivity and ECS fraction.
Yes, thank you for spotting this. We corrected the mistake.
In Supplementary Figure 5 panel B It seems that PPR is significantly reduced upon chemical ischemia (asterisk on columns green) but the authors claimed in the paper at page 10 that "Analysing the paired-pulse ratio (PPR) of postsynaptic response and iGluSnFR transients revealed no consistent changes after chemical ischemia (Figure S5).". Did the authors refer to the data normalized in panel D? In this case, I do not see the need to normalize raw data that have been already shown in a previous panel and that give different statistical results, probably due to the different tests used (paired in panel B and not paired in panel D).
We have clarified this point in the supplementary material (Figure S5, legend). There is a relevant difference between the analyses presented in panel B and D. The paired test presented in B analyses the change of the electrophysiological PPR in response to chemical ischemia. The test in D on the electrophysiologically PPR asks if the reduction in B is significantly different from the changes seen under control conditions. Because it is not, we conclude that chemical ischemia has no relevant effect on the electrophysiological PPR and, in combination with the results on the iGluSnFR PPR, also not on short-term plasticity, as tested here.
References
Hershkowitz N, Katchman AN, Veregge S. Site of synaptic depression during hypoxia: a patch-clamp analysis. Journal of Neurophysiology 69: 432–441, 1993.
Lauritzen M, Dreier JP, Fabricius M, Hartings JA, Graf R, Strong AJ. Clinical Relevance of Cortical Spreading Depression in Neurological Disorders: Migraine, Malignant Stroke, Subarachnoid and Intracranial Hemorrhage, and Traumatic Brain Injury. J Cereb Blood Flow Metab 31: 17–35, 2011.
Pape N, Rose CR. Activation of TRPV4 channels promotes the loss of cellular ATP in organotypic slices of the mouse neocortex exposed to chemical ischemia. The Journal of Physiology 601: 2975–2990, 2023.
Passlick S, Rose CR, Petzold GC, Henneberger C. Disruption of Glutamate Transport and Homeostasis by Acute Metabolic Stress. Front Cell Neurosci 15: 637784, 2021.
Tanaka E, Yamamoto S, Kudo Y, Mihara S, Higashi H. Mechanisms Underlying the Rapid
Depolarization Produced by Deprivation of Oxygen and Glucose in Rat Hippocampal CA1 Neurons In Vitro. Journal of Neurophysiology 78: 891–902, 1997.
-
-
www.biorxiv.org www.biorxiv.org
-
Reviewer #1 (Public Review):
Summary:
This manuscript describes the crystallographic screening of a number of small molecules derived from the natural substrates S-adenosyl methionine (SAM) and adenine, against the SARS-CoV-2 2'-O-methyltransferase NSP16 in complex with its partner NSP10. High-quality structures for several of these are presented together with efforts to evaluate their potential biophysical binding and antiviral activities. The structures are of high quality and the data are well presented but do not yet show potency in biophysical binding. They only offer limited insights into the design of inhibitors of NSP16/10.
Strengths:
The main strengths of the study are the high quality of the structural data, and associated electron density maps making the structural data highly accurate and informative for future structure-based design. These results are clearly presented and communicated in the manuscript. Another strength is the authors' attempts to probe the binding of the identified fragments using biophysical assays. Although in general the outcome of these experiments shows negative data or very weak binding affinities the authors should be commended for attempting several techniques and showing the data clearly. This study is also useful as an example of the complexities associated with drug discovery on a bi-substrate target such as a methyltransferase, several of the observed binding poises were unexpected with compounds that are relatively similar to substrates binding in different parts of the active site or other unexpected orientations. This serves as an example of how experimental structural information is still of crucial importance to structure-based drug design. In general, the claims in the manuscript are well supported by the data.
Weaknesses:
The main limitations of the study are that the new structures generated in the study are fairly limited in terms of chemical space being similar to either SAM or RNA-CAP analogues. It feels a little bit of a lost opportunity to expand this to more diverse ligands which may reveal potential inhibitors that are distinct from current methyltransferase inhibitors based on SAM analogues and truly allow a selective targeting of this important target.
Another limitation is the potentially misleading nature of the antiviral assays. It is not possible to say if these compounds display on-target activity in these assays or even if the inhibition of NSP16/10 would have any effect in these assays. Whilst the authors do mention these points I think this should be emphasized more strongly.
Minor critical points:
The authors state that their crystals and protein preps have co-purified SAM occupying the active site of the crystals. Presumably, this complicates the interpretation of electron density maps as many of the ligands share overlap with the existing SAM density making traditional analysis of difference maps challenging. The authors did not utilize the PanDDA analysis for this step, perhaps this is related to the presence of SAM in the ground state datasets? Also, occupancies are reported in the manuscript in some cases to two significant figures, this seems to be an overestimation of the ability of refinement to determine occupancy based on density alone and the authors should clarify how these figures were reached.
The molecular docking approach to pre-selection of library compounds to soak did not appear to be successful. Could the authors make any observations about the compounds selected by docking or the docking approach used that may explain this?
-
eLife assessment
This important study describes the crystallographic screening of a number of small molecules against a viral enzyme critical for the 5' capping of SARS-CoV-2 RNA and viral replication. While the high-quality crystal structures and complementary biophysical assays in this study provide solid evidence to support the major claims regarding how these small molecule compounds bind to the viral enzyme, the mismatch between the antiviral activity and binding to the viral enzyme of several small molecule compounds could have been more thoroughly investigated or discussed. This paper would be of interest to the fields of coronavirus biology, structural biology, and drug discovery.
-
Reviewer #2 (Public Review):
Summary:
The study by Kremling et al. describes a study of the nsp16-nsp10 methyl transferase from SARS CoV-2 protein which is aimed at identifying inhibitors by x-ray crystallography-based compound screening.<br /> A set of 234 compounds were screened resulting in a set of adenosine-containing compounds or analogues thereof that bind in the SAM site of nsp16-nsp10. The compound selection was mainly based on similarity to SAM and docking of commercially available libraries. The resulting structures are of good quality and clearly show the binding mode of the compounds. It is not surprising to find that these compounds bind in the SAM pocket since they are structurally very similar to portions of SAM. Nevertheless, the result is novel and may be inspirational for the future design of inhibitors. Following up on the crystallographic screen the identified compounds were tested for antiviral activity and binding to np16-nsp10. In addition, an analysis of similar binding sites was presented.
Strengths:
The crystallography is solid and the structures are of good quality. The compound binding constitutes a novel finding.
Weaknesses:
The major weakness is the mismatch between antiviral activity and binding to the target protein. Only one of the compounds could be demonstrated to bind to the nsp16-nsp10 protein. By performing a displacement experiment using ITC Sangivamycin is concluded to bind with a Kd > 1mM. However, the same compound displays antiviral activity with an EC50 of 0.01 microM. Even though the authors do not make specific claims that the antiviral effect is due to inhibition of nsp16-nsp10, it is implicit. If the data is included, it should state specifically that the effect is not likely due to nsp16-nsp10 inhibition.
The structure of the paper and the language needs quite a lot of work to bring it to the expected quality.
Technical point:
Refinement of crystallographic occupancies to single digit percentage is not normally supported by electron density.
-
Author response:
eLife assessment
This important study describes the crystallographic screening of a number of small molecules against a viral enzyme critical for the 5' capping of SARS-CoV-2 RNA and viral replication. While the high-quality crystal structures and complementary biophysical assays in this study provide solid evidence to support the major claims regarding how these small molecule compounds bind to the viral enzyme, the mismatch between the antiviral activity and binding to the viral enzyme of several small molecule compounds could have been more thoroughly investigated or discussed. This paper would be of interest to the fields of coronavirus biology, structural biology, and drug discovery.
We do fully agree that the antiviral assay results could be brought better into context clarifying that the antiviral effects of tubercine and its derivates are due to off-target effects.
Reviewer #1 (Public Review):
Summary:
This manuscript describes the crystallographic screening of a number of small molecules derived from the natural substrates S-adenosyl methionine (SAM) and adenine, against the SARS-CoV-2 2'-O-methyltransferase NSP16 in complex with its partner NSP10. High-quality structures for several of these are presented together with efforts to evaluate their potential biophysical binding and antiviral activities. The structures are of high quality and the data are well presented but do not yet show potency in biophysical binding. They only offer limited insights into the design of inhibitors of NSP16/10.
Strengths:
The main strengths of the study are the high quality of the structural data, and associated electron density maps making the structural data highly accurate and informative for future structure-based design. These results are clearly presented and communicated in the manuscript. Another strength is the authors' attempts to probe the binding of the identified fragments using biophysical assays. Although in general the outcome of these experiments shows negative data or very weak binding affinities the authors should be commended for attempting several techniques and showing the data clearly. This study is also useful as an example of the complexities associated with drug discovery on a bi-substrate target such as a methyltransferase, several of the observed binding poises were unexpected with compounds that are relatively similar to substrates binding in different parts of the active site or other unexpected orientations. This serves as an example of how experimental structural information is still of crucial importance to structure-based drug design. In general, the claims in the manuscript are well supported by the data.
Weaknesses:
The main limitations of the study are that the new structures generated in the study are fairly limited in terms of chemical space being similar to either SAM or RNA-CAP analogues. It feels a little bit of a lost opportunity to expand this to more diverse ligands which may reveal potential inhibitors that are distinct from current methyltransferase inhibitors based on SAM analogues and truly allow a selective targeting of this important target.
It is true that it makes sense to screen for more diverse compounds to expand to a more diverse ligand set and we do hope our study motivates to do so. Given the limited number of crystal structures of nsp10-16 with potential drug molecules, the aim of this study was to upgrade the data base with new complex structures to have a pool of complex structures for future compound designs with increased selectivity. Furthermore, some of the hits are known inhibitors of similar enzymes and most prominent and potent methyltransferase inhibitors are structurally related to SAM, like sinefungin and tubercidine. We do think that knowing which SAM compounds or fragments of SAM are able to bind in the nsp10-16 active site is highly valuable for further specific and optimized inhibitor design.
Another limitation is the potentially misleading nature of the antiviral assays. It is not possible to say if these compounds display on-target activity in these assays or even if the inhibition of NSP16/10 would have any effect in these assays. Whilst the authors do mention these points I think this should be emphasized more strongly.
That is a very valid point and we do not believe that the antiviral activity is based on on-target effects. We do agree that the way it is currently presented can be considered misleading and we indeed clarify this point in the revised version.
Minor critical points:
The authors state that their crystals and protein preps have co-purified SAM occupying the active site of the crystals. Presumably, this complicates the interpretation of electron density maps as many of the ligands share overlap with the existing SAM density making traditional analysis of difference maps challenging. The authors did not utilize the PanDDA analysis for this step, perhaps this is related to the presence of SAM in the ground state datasets? Also, occupancies are reported in the manuscript in some cases to two significant figures, this seems to be an overestimation of the ability of refinement to determine occupancy based on density alone and the authors should clarify how these figures were reached.
We have used PanDDA in parallel for hit finding. We however did not see any advantages for this target over the hit finding results from the visual inspection. This is probably as mentioned because of SAM being present is the “ground state” which complicates the PanDDA map calculations.
Regarding the occupancies, we fully agree with this comment and change it to reasonable digits and clarify how the figures were reached.
The molecular docking approach to pre-selection of library compounds to soak did not appear to be successful. Could the authors make any observations about the compounds selected by docking or the docking approach used that may explain this?
Yes, it is a good point to give possible explanations why the docking approach was not successful to facilitate similar approaches in future studies.
Reviewer #2 (Public Review):
Summary:
The study by Kremling et al. describes a study of the nsp16-nsp10 methyl transferase from SARS CoV-2 protein which is aimed at identifying inhibitors by x-ray crystallography-based compound screening.<br /> A set of 234 compounds were screened resulting in a set of adenosine-containing compounds or analogues thereof that bind in the SAM site of nsp16-nsp10. The compound selection was mainly based on similarity to SAM and docking of commercially available libraries. The resulting structures are of good quality and clearly show the binding mode of the compounds. It is not surprising to find that these compounds bind in the SAM pocket since they are structurally very similar to portions of SAM. Nevertheless, the result is novel and may be inspirational for the future design of inhibitors. Following up on the crystallographic screen the identified compounds were tested for antiviral activity and binding to np16-nsp10. In addition, an analysis of similar binding sites was presented.
Strengths:
The crystallography is solid and the structures are of good quality. The compound binding constitutes a novel finding.
Weaknesses:
The major weakness is the mismatch between antiviral activity and binding to the target protein. Only one of the compounds could be demonstrated to bind to the nsp16-nsp10 protein. By performing a displacement experiment using ITC Sangivamycin is concluded to bind with a Kd > 1mM. However, the same compound displays antiviral activity with an EC50 of 0.01 microM. Even though the authors do not make specific claims that the antiviral effect is due to inhibition of nsp16-nsp10, it is implicit. If the data is included, it should state specifically that the effect is not likely due to nsp16-nsp10 inhibition.
We do believe that the antiviral data are valuable and should be published within this work. We also agree with the comment that it should be clearly stated that the antiviral effect is not likely because of nsp10-16 inhibition and we will optimize that accordingly.
The structure of the paper and the language needs quite a lot of work to bring it to the expected quality.
We will go through the manuscript again and further improve the structure and language as much as possible
Technical point:
Refinement of crystallographic occupancies to single digit percentage is not normally supported by electron density.
We agree with that point and correct it in the revised version.
-
-
www.biorxiv.org www.biorxiv.org
-
Author response:
The following is the authors’ response to the previous reviews.
Public Review:
Reviewer #1 (Public Review):
In 'Systems analysis of miR-199a/b-5p and multiple miR-199a/b-5p targets during chondrogenesis', Patel et al. present a variety of analyses using different methodologies to investigate the importance of two miRNAs in regulating gene expression in a cellular model of cartilage development. They first re-analysed existing data to identify these miRNAs as one of the most dynamic across a chondrogenesis development time course. Next, they manipulated the expression of these miRNAs and showed that this affected the expression of various marker genes as expected. An RNA-seq experiment on these manipulations identified putative mRNA targets of the miRNAs which were also supported by bioinformatics predictions. These top hits were validated experimentally and, finally, a kinetic model was developed to demonstrate the relationship between the miRNAs and mRNAs studied throughout the paper.
I am convinced that the novel relationships reported here between miR-199a/b-5p and target genes FZD6, ITGA3, and CAV1 are likely to be genuine. It is important for researchers working on this system and related diseases to know all the miRNA/mRNA relationships but, as the authors have already published work studying the most dynamic miRNA (miR-140-5p) in this biological system I was not convinced that this study of the second miRNA in their list provided a conceptual advance on their previous work.
We believe this study is an enhancement on our previous work for two reasons, which have been alluded to in new text within the introduction. Firstly, our previous work used experimental and bioinformatic analysis to identify microRNAs with significant regulatory roles during chondrogenesis. This new manuscript additionally uses a systems biology approaches to identify novel miRNA-mRNA interactions and capture these within an in silico model. Secondly, this work was initiated by the analysis of our previously generated data – using a novel tool we developed for this type of data (Bioconductor - TimiRGeN).
I was also concerned with the lack of reporting of details of the manipulation experiments. The authors state that they have over-expressed miR-199a-5p (Figure 2A) and knocked down miR-199b-5p (Figure 2B) but they should have reported their proof that these experiments had worked as predicted, e.g. showing the qRT-PCR change in miRNA expression. Similarly, I was concerned that one miRNA was over-expressed while the other was knocked down - why did the authors not attempt to manipulate both miRNAs in both directions? Were they unable to achieve a significant change in miRNA expression or did these experiments not confirm the results reported in the manuscript?
We agree with the reviewer that some additional data were needed to demonstrate the effective regulation of miR-199-5p. Hence, Supplementary Figure 1 is now included which provides validation of the effects of miR-199a-5p overexpression
(Supplementary Figure 1A) and inhibition of miR-199a/b-5p (Supplementary Figure 1B). Within the main manuscript, Figure 2B has been amended to include the consequences of inhibition of miR-199a-5p, with 2C showing the consequences of miR-199b-5p inhibition. Further, we include new data with regards to miR-199a/b-5p inhibition on CAV1 (Figure 4A).
I had a number of issues with the way in which some of the data was presented. Table 1 only reported whether a specific pathway was significant or not for a given differential expression analysis but this concealed the extent of this enrichment or the level of statistical significance reported. Could it be redrawn to more similarly match the format of Figure 3A? The various shades of grey in Figure 2 and Figure 4 made it impossible to discriminate between treatments and therefore identify whether these data supported the conclusions made in the text. It also appeared that the same results were reported in Figure 3B and 3C and, indeed, Figure 3B was not referred to in the main text. Perhaps this figure could be made more concise by removing one of these two sets of panels.
We agree with all points made here and have amended these within the manuscript. Figure 1A is now pathway enrichment plots from the TimiRGeN R Bioconductor package, and the table which previously showed the pathways enriched at each time point is now in the supplementary materials (supp. Table 1). Figure 2 and 4 now have color instead of shades of grey. Figure 3C has now been moved to supplementary materials (Supplementary Figure 2) and is referenced in the text.
Overall, while I think that this is an interesting and valuable paper, I think its findings are relatively limited to those interested in the role of miRNAs in this specific biomedical context.
Reviewer #2 (Public Review):
Summary:
This study represents an ambitious endeavor to comprehensively analyze the role of miR199a/b-5p and its networks in cartilage formation. By conducting experiments that go beyond in vitro MSC differentiation models, more robust conclusions can be achieved.
Strengths:
This research investigates the role of miR-199a/b-5p during chondrogenesis using bioinformatics and in vitro experimental systems. The significance of miRNAs in chondrogenesis and OA is crucial, warranting further research, and this study contributes novel insights.
Weaknesses:
While miR-140 and miR-455 are used as controls, these miRNAs have been demonstrated to be more relevant to Cartilage Homeostasis than chondrogenesis itself. Their deficiency has been genetically proven to induce Osteoarthritis in mice. Therefore, the results of this study should be considered in comparison with these existing findings.
We agree with the reviewers comments. miR-455-null mice develop normally but miR-140-null (or mutated) mice and humans do have skeletal abnormalities (e.g. Nat Med. 2019 Apr;25(4):583-590. doi: 10.1038/s41591-019-0353-2), indicating a role in chondrogenesis. We have made an addition in the description to point towards the need to assess the roles miR-199a/b-5p may play during skeletogenesis and OA. We anticipate miR-199a/b-5p to be relevant in OA and have ongoing additional work for this – but this beyond the scope of this manuscript.
Recommendations For The Authors:
Reviewer #1 (Recommendations For The Authors):
Beyond the issues raised in the public review, I had a few minor recommendations that are largely designed to help improve the understanding of the manuscript as it is currently written.
(1) Please provide the statistical tests used to obtain p-values in the Figure 2 and 4 legends.
We have now added statistical test information to the figure legends of figures 2 and 4.
(2) It is stated on p. 9 that both miRNAs may share a functional repertoire because 25 and 341 genes are interested between their inhibition experiments. Please provide statistical support that this overlap is an enrichment over the null background in this experiment. Total DE genes – chi squared. Expected / Observed.
A chi-squared test is now presented in the manuscript which shows that the number of significant genes which were found in common between miR-199a-5p knockdown and miR-199b-5p knockdown were significantly more than expected for day 0 or day 1 of the experiments.
(3) The final sentence on p. 12 (beginning 'Size of the points reflect...') seemed out of place - is it part of a legend?
Thank you for pointing out this mistake - it was part of figure 3C and now is in the supplementary materials.
(4) A sentence on p. 14 reads that 'FZD6 and ITGA3 levels increased significantly' but this should read decreased, rather than increased. Quite an important typo!
Thank you for pointing this error out. It has been corrected.
(5) Theoretical transcripts are mentioned in the legend of Figure 5A but these were not present in the figure. Please include these or remove them from the legend.
This error has been removed form Figure 5A.
(6) On p 20, the references 22 and 27 should I think be moved to earlier in the sentence (after 'miR-199a-5p-FZD6 has been predicted previously'). Currently, it reads as if these references support your luciferase assays which you claim are the first evidence for this target relationship.
We agree with this change and have corrected the manuscript.
(7) The reference to Figure 5D on p. 20 should be a reference to Figure 5C.
Thank you for pointing this error out – this has been corrected.
Reviewer #2 (Recommendations For The Authors):
(1) The paper is based on the importance of miR-140 and miR-455 as miRNAs in chondrogenesis, citing only Barter, M. J. et al. Stem Cells 33, (2015). Considering the scope and results of this study, this citation is insufficient.
We agree with this reviewers comments. For many year miR-140 and miR-455 have been experimented on and their importance in OA research has become apparent. We included additional references within the introduction to address this.
(2) Analyzing chondrogenesis solely through differentiation experiments from MSCs is inadequate. It is essential to perform experiments involving the network within normal cartilage tissue and/or the generation of knockout mice to understand the precise role of miR199a/b-5p in chondrogenesis.
We have added an additional paragraph in the discussion to state this, and do believe it is highly important that miR-199a/b-5p be tested in OA samples – however this would be beyond the intended scope of this article.
(3) In light of the above points, it is imperative to investigate the role of miR-199a/b-5p beyond the in vitro differentiation model from MSCs, encompassing mouse OA models or human disease samples.
In tangent with the previous address, we agree with the pretense and believe additional experiments should be performed to gain more insight to the mechanism of how miR-199a/b-5p regulate OA. But development of a new mouse line to investigate this is not in the scope of this manuscript.
-
eLife assessment
This study provides valuable insight into the role of miR-199a/b-5p in cartilage formation. The evidence supporting the significance of the identified miRNA and its target mRNA transcripts is convincing. This paper will likely primarily benefit scientists focused on diseases related to this biological process, such as osteoarthritis. Furthermore, researchers with a broader interest in miRNAs may find the computational model to identify novel RNA-RNA interactions particularly helpful.
-
Reviewer #1 (Public Review):
The comments below are from my review of the first submission of this article. I would now like to thank the authors for their hard work in responding to my comments. I am happy with the changes they have made, in particular the inclusion of further experimental evidence in Figures 2 and 4. I have no further comments to make.
In 'Systems analysis of miR-199a/b-5p and multiple miR-199a/b-5p targets during chondrogenesis', Patel et al. present a variety of analyses using different methodologies to investigate the importance of two miRNAs in regulating gene expression in a cellular model of cartilage development. They first re-analysed existing data to identify these miRNAs as one of the most dynamic across a chondrogenesis development timecourse. Next, they manipulated the expression of these miRNAs and showed that this affected the expression of various marker genes as expected. An RNA-seq experiment on these manipulations identified putative mRNA targets of the miRNAs which were also supported by bioinformatics predictions. These top hits were validated experimentally and, finally, a kinetic model was developed to demonstrate the relationship between the miRNAs and mRNAs studied throughout the paper.
I am convinced that the novel relationships reported here between miR-199a/b-5p and target genes FZD6, ITGA3 and CAV1 are likely to be genuine. It is important for researchers working on this system and related diseases to know all the miRNA/mRNA relationships but, as the authors have already published work studying the most dynamic miRNA (miR-140-5p) in this biological system I was not convinced that this study of the second miRNA in their list provided a conceptual advance on their previous work.
I was also concerned with the lack of reporting of details of the manipulation experiments. The authors state that they have over-expressed miR-199a-5p (Figure 2A) and knocked down miR-199b-5p (Figure 2B) but they should have reported their proof that these experiments had worked as predicted, e.g. showing the qRT-PCR change in miRNA expression. Similarly, I was concerned that one miRNA was over-expressed while the other was knocked down - why did the authors not attempt to manipulate both miRNAs in both directions? Were they unable to achieve a significant change in miRNA expression or did these experiments not confirm the results reported in the manuscript?
I had a number of issues with the way in which some of the data is presented. Table 1 only reported whether a specific pathway was significant or not for a given differential expression analysis but this concealed the extent of this enrichment or the level of statistical significance reported. Could it be redrawn to more similarly match the format of Figure 3A? The various shades of grey in Figure 2 and Figure 4 made it impossible to discriminate between treatments and therefore identify whether these data supported the conclusions made in the text. It also appeared that the same results were reported in Figure 3B and 3C and, indeed, Figure 3B was not referred to in the main text. Perhaps this figure could be made more concise by removing one of these two sets of panels?
Overall, while I think that this is an interesting and valuable paper, I think its findings are relatively limited to those interested in the role of miRNAs in this specific biomedical context.
-
-
-
eLife assessment
This useful study shows the representations that emerge in a recurrent neural network trained on a navigation task by requiring path integration and decodability. The network modeling was solid, but interpretation of neural data and mechanisms was incomplete.
-
Reviewer #1 (Public Review):
Summary:
This work studies representations in a network with one recurrent layer and one output layer that needs to path-integrate so that its position can be accurately decoded from its output. To formalise this problem, the authors define a cost function consisting of the decoding error and a regularisation term. They specify a decoding procedure that at a given time averages the output unit center locations, weighted by the activity of the unit at that time. The network is initialised without position information, and only receives a velocity signal (and a context signal to index the environment) at each timestep, so to achieve low decoding error it needs to infer its position and keep it updated with respect to its velocity by path integration.
The authors take the trained network and let it explore a series of environments with different geometries while collecting unit activities to probe learned representations. They find localised responses in the output units (resembling place fields) and border responses in the recurrent units. Across environments, the output units show global remapping and the recurrent units show rate remapping. Stretching the environment generally produces stretched responses in output and recurrent units. Ratemaps remain stable within environments and stabilise after noise injection. Low-dimensional projections of the recurrent population activity forms environment-specific clusters that reflect the environment's geometry, which suggests independent rather than generalised representations. Finally, the authors discover that the centers of the output unit ratemaps cluster together on a triangular lattice (like the receptive fields of a single grid cell), and find significant clustering of place cell centers in empirical data as well.
The model setup and simulations are clearly described, and are an interesting exploration of the consequences of a particular set of training requirements - here: path integration and decodability. But it is not obvious to what extent the modelling choices are a realistic reflection of how the brain solves navigation. Therefore it is not clear whether the results generalize beyond the specifics of the setup here.
Strengths:
The authors introduce a very minimal set of model requirements, assumptions, and constraints. In that sense, the model can function as a useful 'baseline', that shows how spatial representations and remapping properties can emerge from the requirement of path integration and decodability alone. Moreover, the authors use the same formalism to relate their setup to existing spatial navigation models, which is informative.
The global remapping that the authors show is convincing and well-supported by their analyses. The geometric manipulations and the resulting stretching of place responses, without additional training, are interesting. They seem to suggest that the recurrent network may scale the velocity input by the environment dimensions so that the exact same path integrator-output mappings remain valid (but maybe there are other mechanisms too that achieve the same).
The clustering of place cell peaks on a triangular lattice is intriguing, given there is no grid cell input. It could have something to do with the fact that a triangular lattice provides optimal coverage of 2d space? The included comparison with empirical data is valuable, although the authors only show significant clustering - there is no analysis of its grid-like regularity.
Weaknesses:
The navigation problem that needs to be solved by the model is a bit of an odd one. Without any initial position information, the network needs to figure out where it is, and then path-integrate with respect to a velocity signal. As the authors remark in Methods 4.2, without additional input, the only way to infer location is from border interactions. It is like navigating in absolute darkness. Therefore, it seems likely that the salient wall representations found in the recurrent units are just a consequence of the specific navigation task here; it is unclear if the same would apply in natural navigation. In natural navigation, there are many more sensory cues that help inferring location, most importantly vision, but also smell and whiskers/touch (which provides a more direct wall interaction; here, wall interactions are indirect by constraining velocity vectors). There is a similar but weaker concern about whether the (place cell like) localised firing fields of the output units are a direct consequence of the decoding procedure that only considers activity center locations.
The conclusion that 'contexts are attractive' (heading of section 2) is not well-supported. The authors show 'attractor-like behaviour' within a single context, but there could be alternative explanations for the recovery of stable ratemaps after noise injection. For example, the noise injection could scramble the network's currently inferred position, so that it would need to re-infer its position from boundary interactions along the trajectory. In that case the stabilisation would be driven by the input, not just internal attractor dynamics. Moreover, the authors show that different contexts occupy different regions in the space of low-dimensional projections of recurrent activity, but not that these regions are attractive.
The authors report empirical data that shows clustering of place cell centers like they find for their output units. They report that 'there appears to be a tendency for the clusters to arrange in hexagonal fashion, similar to our computational findings'. They only quantify the clustering, but not the arrangement. Moreover, in Figure 7e they only plot data from a single animal, then plot all other animals in the supplementary. Does the analysis of Fig 7f include all animals, or just the one for which the data is plotted in 7e? If so, why that animal? As Appendix C mentions that the ratemap for the plotted animal 'has a hexagonal resemblance' whereas other have 'no clear pattern in their center arrangements', it feels like cherrypicking to only analyse one animal without further justification.
-
Reviewer #2 (Public Review):
Summary:<br /> The authors proposed a neural network model to explore the spatial representations of the hippocampal CA1 and entorhinal cortex (EC) and the remapping of these representations when multiple environments are learned. The model consists of a recurrent network and output units (a decoder) mimicking the EC and CA1, respectively. The major results of this study are: the EC network generates cells with their receptive fields tuned to a border of the arena; decoder develops neuron clusters arranged in a hexagonal lattice. Thus, the model accounts for entrohinal border cells and CA1 place cells. The authors also suggested the remapping of place cells occurs between different environments through state transitions corresponding to unstable dynamical modes in the recurrent network.
Strengths:<br /> The authors found a spatial arrangement of receptive fields similar to their model's prediction in experimental data recorded from CA1. Thus, the model proposes a plausible mechanisms to generate hippocampal spatial representations without relying on grid cells. This result is consistent with the observation that grid cells are unnecessary to generate CA1 place cells.
The suggestion about the remapping mechanism shows an interesting theoretical possibility.
Weaknesses:<br /> The explicit mechanisms of generating border cells and place cells and those underlying remapping were not clarified at a satisfactory level.
The model cannot generate entorhinal grid cells. Therefore, how the proposed model is integrated into the entire picture of the hippocampal mechanism of memory processing remains elusive.
-
Reviewer #3 (Public Review):
Summary:
The authors used recurrent neural network modelling of spatial navigation tasks to investigate border and place cell behaviour during remapping phenomena.
Strengths:
The neural network training seemed for the most part (see comments later) well-performed, and the analyses used to make the points were thorough.
The paper and ideas were well explained.
Figure 4 contained some interesting and strong evidence for map-like generalisation as environmental geometry was warped.
Figure 7 was striking, and potentially very interesting.
It was impressive that the RNN path-integration error stayed low for so long (Fig A1), given that normally networks that only work with dead-reckoning have errors that compound. I would have loved to know how the network was doing this, given that borders did not provide sensory input to the network. I could not think of many other plausible explanations... It would be even more impressive if it was preserved when the network was slightly noisy.
Weaknesses:
I felt that the stated neuroscience interpretations were not well supported by the presented evidence, for a few reasons I'll now detail.
First, I was unconvinced by the interpretation of the reported recurrent cells as border cells. An equally likely hypothesis seemed to be that they were positions cells that are linearly encoding the x and y position, which when your environment only contains external linear boundaries, look the same. As in figure 4, in environments with internal boundaries the cells do not encode them, they encode (x,y) position. Further, if I'm not misunderstanding, there is, throughout, a confusing case of broken symmetry. The cells appear to code not for any random linear direction, but for either the x or y axis (i.e. there are x cells and y cells). These look like border cells in environments in which the boundaries are external only, and align with the axes (like square and rectangular ones), but the same also appears to be true in the rotationally symmetric circular environment, which strikes me as very odd. I can't think of a good reason why the cells in circular environments should care about the particular choice of (x,y) axes... unless the choice of position encoding scheme is leaking influence throughout. A good test of these would be differently oriented (45 degree rotated square) or more geometrically complicated (two diamonds connected) environments in which the difference between a pure (x,y) code and a border code are more obvious.
Next, the decoding mechanism used seems to have forced the representation to learn place cells (no other cell type is going to be usefully decodable?). That is, in itself, not a problem. It just changes the interpretation of the results. To be a normative interpretation for place cells you need to show some evidence that this decoding mechanism is relevant for the brain, since this seems to be where they are coming from in this model. Instead, this is a model with place cells built into it, which can then be used for studying things like remapping, which is a reasonable stance.
However, the remapping results were also puzzling. The authors present convincing evidence that the recurrent units effectively form 6 different maps of the 6 different environments (e.g. the sparsity of the cod, or fig 6a), with the place cells remapping between environments. Yet, as the authors point out, in neural data the finding is that some cells generalise their co-firing patterns across environments (e.g. grid cells, border cells), while place cells remap, making it unclear what correspondence to make between the authors network and the brain. There are existing normative models that capture both entorhinal's consistent and hippocampus' less consistent neural remapping behaviour (Whittington et al. and probably others), what have we then learnt from this exercise?
One striking result was figure 7, the hexagonal arrangement of place cell centres. I had one question that I couldn't find the answer to in the paper, which would change my interpretation. Are place cell centres within a single clusters of points in figure 7a, for example, from one cell across the 100 trajectories, or from many? If each cluster belongs to a different place cell then the interpretation seems like some kind of optimal packing/coding of 2D space by a set of place cells, an interesting prediction. If multiple place cells fall within a single cluster then that's a very puzzling suggestion about the grouping of place cells into these discrete clusters. From figure 7c I guess that the former is the likely interpretation, from the fact that clusters appear to maintain the same colour, and are unlikely to be co-remapping place cells, but I would like to know for sure!
I felt that the neural data analysis was unconvincing. Most notably, the statistical effect was found in only one of seven animals. Random noise is likely to pass statistical tests 1 in 20 times (at 0.05 p value), this seems like it could have been something similar? Further, the data was compared to a null model in which place cell fields were randomly distributed. The authors claim place cell fields have two properties that the random model doesn't (1) clustering to edges (as experimentally reported) and (2) much more provocatively, a hexagonal lattice arrangement. The test seems to collude the two; I think that nearby ball radii could be overrepresented, as in figure 7f, due to either effect. I would have liked to see a computation of the statistic for a null model in which place cells were random but with a bias towards to boundaries of the environment that matches the observed changing density, to distinguish these two hypotheses.
Some smaller weaknesses:<br /> - Had the models trained to convergence? From the loss plot it seemed like not, and when including regularisors recent work (grokking phenomena, e.g. Nanda et al. 2023) has shown the importance of letting the regularisor minimise completely to see the resulting effect. Else you are interpreting representations that are likely still being learnt, a dangerous business.<br /> - Since RNNs are nonlinear it seems that eigenvalues larger than 1 doesn't necessarily mean unstable?<br /> - Why do you not include a bias in the networks? ReLU networks without bias are not universal function approximators, so it is a real change in architecture that doesn't seem to have any positives?<br /> - The claim that this work provided a mathematical formalism of the intuitive idea of a cognitive map seems strange, given that upwards of 10 of the works this paper cite also mathematically formalise a cognitive map into a similar integration loss for a neural network.
Aim Achieved? Impact/Utility/Context of Work
Given the listed weaknesses, I think this was a thorough exploration of how this network with these losses is able to path-integrate its position and remap. This is useful, it is good to know how another neural network with slightly different constraints learns to perform these behaviours. That said, I do not think the link to neuroscience was convincing, and as such, it has not achieved its stated aim of explaining these phenomena in biology. The mechanism for remapping in the entorhinal module seemed fundamentally different to the brain's, instead using completely disjoint maps; the recurrent cell types described seemed to match no described cell type (no bad thing in itself, but it does limit the permissible neuroscience claims) either in tuning or remapping properties, with a potentially worrying link between an arbitrary encoding choice and the responses; and the striking place cell prediction was unconvincingly matched by neural data. Further, this is a busy field in which many remapping results have been shown before by similar models, limiting the impact of this work. For example, George et al. and Whittington et al. show remapping of place cells across environments; Whittington et al. study remapping of entorhinal codes; and Rajkumar Vasudeva et al. 2022 show similar place cell stretching results under environmental shifts. As such, this papers contribution is muddied significantly.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This fundamental study uncovers the detailed structural mechanisms by which covalent and non-covalent synthetic ligands can simultaneously occupy the binding pocket of the nuclear receptor transcription factor PPARγ. Supported by a compelling set of structural, biochemical, and biophysical data, the findings challenge the reliability of two widely used covalent inhibitors and have broader implications for nuclear receptor research. This study will interest structural biologists and biochemists investigating the binding mechanisms of ligands targeting the nuclear receptor superfamily.
-
Reviewer #1 (Public Review):
Summary:
PPARgamma is a nuclear receptor that binds to orthosteric ligands to coordinate transcriptional programs that are critical for adipocyte biogenesis and insulin sensitivity. Consequently, it is a critical therapeutic target for many diseases, but especially diabetes. The malleable nature and promiscuity of the PPARgamma orthosteric ligand binding pocket have confounded the development of improved therapeutic modulators. Covalent inhibitors have been developed but they show unanticipated mechanisms of action depending on which orthosteric ligands are present. In this work, Shang and Kojetin present a compelling and comprehensive structural, biochemical, and biophysical analysis that shows how covalent and noncovalent ligands can co-occupy the PPARgamma ligand binding pocket to elicit distinctive preferences of coactivator and corepressor proteins. Importantly, this work shows how the covalent inhibitors GW9662 and T0070907 may be unreliable tools as pan-PPARgamma inhibitors despite their widespread use.
Strengths:
- Highly detailed structure and functional analyses provide a comprehensive structure-based hypothesis for the relationship between PPARgamma ligand binding domain co-occupancy and allosteric mechanisms of action.<br /> - Multiple orthogonal approaches are used to provide high-resolution information on ligand binding poses and protein dynamics.<br /> - The large number of x-ray crystal structures solved for this manuscript should be applauded along with their rigorous validation and interpretation.
Weaknesses
- Inclusion of statistical analysis is missing in several places in the text.<br /> - Functional analysis beyond coregulator binding is needed.
-
Reviewer #2 (Public Review):
Summary:
The flexibility of the ligand binding domain (LBD) of NRs allows various modes of ligand binding leading to various cellular outcomes. In the case of PPARγ, it's known that two ligands can co-bind to the receptor. However, whether a covalent inhibitor functions by blocking the binding of a non-covalent ligand, or co-bind in a manner that weakens the binding of a non-covalent ligand remains unclear. In this study, the authors first used TR-FRET and NMR to demonstrate that covalent inhibitors (such as GW9662 and T0070907) weaken but do not prevent non-covalent synthetic ligands from binding, likely via an allosteric mechanism. The AF-2 helix can exchange between active and repressive conformations, and covalent inhibitors shift the conformation toward a transcriptionally repressive one to reduce the orthosteric binding of the non-covalent ligands. By co-crystal studies, the authors further reveal the structural details of various non-covalent ligand binding mechanisms in a ligand-specific manner (e.g., an alternate binding site, or a new orthosteric binding mode by alerting covalent ligand binding pose).
Strengths:
The biochemical and biophysical evidence presented is strong and convincing.
Weaknesses:
However, the co-crystal studies were performed by soaking non-covalent ligands to LBD pre-crystalized with a covalent inhibitor. Since the covalent inhibitors would shift the LBD toward transcriptionally repressive conformation which reduces orthosteric binding of non-covalent ligands, if the sequence was reversed (i.e., soaking a covalent inhibitor to LBD pre-crystalized with a non-covalent ligand), would a similar conclusion be drawn? Additional discussion will broaden the implications of the conclusion.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
The authors' innovative use of single-cell sequencing combined with physiological phenotyping of 5 different Parkinsons models in Drosophila provides compelling support for the important conclusion that these different models have a shared convergent effect on olfactory projection neuron (OPN) dysfunction. The effect on OPN occurs early in disease progression, and likely underlies anosmia observed as an early symptom of PD. Additional experiments and analysis are required to support the authors' suggestions that: (a) the defect in these models is specific to cholinergic OPNs; (b) that OPN degeneration is (causally) connected to dopaminergic neuron (DAN) degeneration; and also (c) that observed motor defects are reasonable measure of DAN dysfunction.
-
Reviewer #1 (Public Review):
This is a fantastic, comprehensive, timely, and landmark pan-species work that demonstrates the convergence of multiple familial PD mutations onto a synaptic program. It is extremely well written and I have only a few comments that do not require additional data collection.
Major Comments:
(1) In the functional experiments performing calcium imaging on projection neurons I could not find a count of cell bodies across conditions. Since the loss of OPNs could explain the reduced calcium signal, this is a critical control to perform. A differential abundance test on the single-cell data would also suffice here and be easy for the authors to perform with their existing data.
(2) One of the authors' conclusions is that cholinergic neurons and the olfactory system are acutely impacted by these PD mutations. However, I wonder if this is the case:<br /> a. Most Drosophila excitatory neurons are cholinergic and only a subpopulation appear to be dysregulated by these mutations. The authors point out that visual neurons also have many DEGs, couldn't the visual system also be dysregulated in these flies? Is there something special about these cholinergic neurons versus other cholinergic neurons in the fly brain? I wonder if they can leverage their nice dataset to say something about vulnerability.<br /> b. As far as I can tell, the cross-species analysis of DEGs (Figure 3) is agnostic to neuronal cell type, although the conclusion seems to suggest only cholinergic neurons were contrasted. Is this correct? Could you please clarify this in the text as it's an important detail. If not, Have the authors tried comparing only cholinergic neuron DEGs across species? That would lend strength to their specificity argument. The results for the NBM are impressive. Could the authors add more detail to the main text here about other regions to the main text?<br /> c. Uniquely within the human data, are cholinergic neurons more dysregulated than others? I understand this is not an early timepoint but would still be useful to discuss.<br /> d. In the discussion, the authors say that olfactory neurons are uniquely poised to be dysregulated as they are large and have high activity. Is this really true compared to other circuits? I didn't find the references convincing and I am not sure this has been borne out in electron microscopy reconstructions for anatomy.
-
Reviewer #2 (Public Review):
Summary:
Pech et al selected 5 Parkinson's disease-causing genes, and generated multiple Drosophila lines by replacing the Drosophila lrrk, rab39, auxilin (aux), synaptojanin (synj), and Pink1 genes with wild-type and pathogenic mutant human or Drosophila cDNA sequences. First, the authors performed a panel of assays to characterize the phenotypes of the models mentioned above. Next, by using single-cell RNA-seq and comparing fly data with human postmortem tissue data, the authors identified multiple cell clusters being commonly dysregulated in these models, highlighting the olfactory projection neurons. Next, by using selective expression of Ca2+-sensor GCaMP3 in the OPN, the authors confirmed the synaptic impairment in these models, which was further strengthened by olfactory performance defects.
Strengths:
The authors overall investigated the functionality of PD-related mutations at endogenous levels and found a very interesting shared pathway through single-cell analysis, more importantly, they performed nice follow-up work using multiple assays.
Weaknesses:
While the authors state this is a new collection of five familial PD knock-in models, the AuxR927G model has been published and carefully characterized in Jacquemyn et al., 2023. ERG has been performed for Aux R927G in Jacquemyn et al., 2023, but the findings are different from what's shown in Figure 1b and Supplementary Figure 1d, which the authors should try to explain. Moreover, according to the authors, the hPINK1control was the expression of human PINK1 with UAS-hPINK1 and nsyb-Gal4 due to technical obstacles. Having PINK1 WT being an overexpression model, makes it difficult to explain PINK1 mutant phenotypes. It will be strengthened if the authors use UAS-hPINK1 and nsyb-Gal4 (or maybe ubiquitous Gal4) to rescue hPink1L347P and hPink1P399L phenotypes. In addition, although the authors picked these models targeting different biology/ pathways, however, Aux and Synj both act in related steps of Clathrin-mediated endocytosis, with LRRK2 being their accessory regulatory proteins. Therefore, is the data set more favorable in identifying synaptic-related defects?
GH146-GAL4+ PNs are derived from three neuroblast lineages, producing both cholinergic and GABAergic inhibitory PNs (Li et al, 2017). Therefore, OPN neurons have more than "cholinergic projection neurons". How do we know from single-cell data that cholinergic neurons were more vulnerable across 5 models?
In Figure 1b, the authors assumed that locomotion defects were caused by dopaminergic neuron dysfunction. However, to better support it, the author should perform rescue experiments using dopaminergic neuron-specific Gal4 drivers. Otherwise, the authors may consider staining DA neurons and performing cell counting. Furthermore, the authors stated in the discussion, that "We now place cholinergic failure firmly ahead of dopaminergic system failure in flies", which feels rushed and insufficient to draw such a conclusion, especially given no experimental evidence was provided, particularly related to DA neuron dysfunction, in this manuscript.
It is interesting to see that different familial PD mutations converge onto synapses. The authors have suggested that different mechanisms may be involved directly through regulating synaptic functions, or indirectly through mitochondria or transport. It will be improved if the authors extend their analysis on Figure 3, and better utilize their single-cell data to dissect the mechanisms. For example, for all the candidates listed in Figure 3C, are they all altered in the same direction across 5 models?
While this approach is carefully performed, the authors should state in the discussions the strengths and the caveats of the current strategy. For example, what kind of knowledge have we gained by introducing these mutations at an endogenous locus? Are there any caveats of having scRNAseq at day 5 only but being compared with postmortem human disease tissue?
-
Reviewer #3 (Public Review):
Summary:
This study investigates the cellular and molecular events leading to hyposmia, an early dysfunction in Parkinson's disease (PD), which develops up to 10 years prior to motor symptoms. The authors use five Drosophila knock-in models of familial PD genes (LRRK2, RAB39B, PINK1, DNAJC6 (Aux), and SYNJ1 (Synj)), three expressing human genes and two Drosophila genes with equivalent mutations.
The authors carry out single-cell RNA sequencing of young fly brains and single-nucleus RNA sequencing of human brain samples. The authors found that cholinergic olfactory projection neurons (OPN) were consistently affected across the fly models, showing synaptic dysfunction before the onset of motor deficits, known to be associated with dopaminergic neuron (DAN) dysfunction.
Single-cell RNA sequencing revealed significant transcriptional deregulation of synaptic genes in OPNs across all five fly PD models. This synaptic dysfunction was confirmed by impaired calcium signalling and morphological changes in synaptic OPN terminals. Furthermore, these young PD flies exhibited olfactory behavioural deficits that were rescued by selective expression of wild-type genes in OPNs.
Single-nucleus RNA sequencing of post-mortem brain samples from PD patients with LRRK2 risk mutations revealed similar synaptic gene deregulation in cholinergic neurons, particularly in the nucleus basalis of Meynert (NBM). Gene ontology analysis highlighted enrichment for processes related to presynaptic function, protein homeostasis, RNA regulation, and mitochondrial function.
This study provides compelling evidence for the early and primary involvement of cholinergic dysfunction in PD pathogenesis, preceding the canonical DAN degeneration. The convergence of familial PD mutations on synaptic dysfunction in cholinergic projection neurons suggests a common mechanism contributing to early non-motor symptoms like hyposmia. The authors also emphasise the potential of targeting cholinergic neurons for early diagnosis and intervention in PD.
Strengths:
This study presents a novel approach, combining multiple mutants to identify salient disease mechanisms. The quality of the data and analysis is of a high standard, providing compelling evidence for the role of OPN neurons in olfactory dysfunction in PD. The comprehensive single-cell RNA sequencing data from both flies and humans is a valuable resource for the research community. The identification of consistent impairments in cholinergic olfactory neurons, at early disease stages, is a powerful finding that highlights the convergent nature of PD progression. The comparison between fly models and human patients' brains provides strong evidence of the conservation of molecular mechanisms of disease, which can be built upon in further studies using flies to prove causal relationships between the defects described here and neurodegeneration.
The identification of specific neurons involved in olfactory dysfunction opens up potential avenues for diagnostic and therapeutic interventions.
Weaknesses:
The causal relationship between early olfactory dysfunction and later motor symptoms in PD remains unclear. It is also uncertain whether this early defect contributes to neurodegeneration or is simply a reflection of the sensitivity of olfactory neurons to cellular impairments. The study does not investigate whether the observed early olfactory impairment in flies leads to later DAN deficits. Additionally, the single-cell RNA sequencing analysis reveals several affected neuronal populations that are not further explored. The main weakness of the paper is the lack of conclusive evidence linking early olfactory dysfunction to later disease progression. The rationale behind the selection of specific mutants and neuronal populations for further analysis could be better qualified.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This study uses state-of-the-art methods to label endogenous dopamine receptors in a subset of Drosophila mushroom body neuronal types. The authors report that DopR1 and Dop2R receptors, which have opposing effects in intracellular cAMP, are present in axons termini of Kenyon cells, as well as those of two classes of dopaminergic neurons that innervate the mushroom body indicative of autocrine modulation by dopaminergic neurons. Additional experiments showing opposing effects of starvation on DopR1 and DopR2 levels in mushroom body neurons are consistent with a role for dopamine receptor levels increasing the efficiency of learned food-odour associations in starved flies. Supported by solid data, this is a valuable contribution to the field.
-
Reviewer #1 (Public Review):
Summary:
This is an important and interesting study that uses the split-GFP approach. Localization of receptors and correlating them to function is important in understanding the circuit basis of behavior.
Strengths:
The split-GFP approach allows visualization of subcellular enrichment of dopamine receptors in the plasma membrane of GAL4-expressing neurons allowing for a high level of specificity.
The authors resolve the presynaptic localization of DopR1 and Dop2R, in "giant" Drosophila neurons differentiated from cytokinesis-arrested neuroblasts in culture as it is not clear in the lobes and calyx.
Starvation-induced opposite responses of dopamine receptor expression in the PPL1 and PAM DANs provide key insights into models of appetitive learning.
Starvation-induced increase in D2R allows for increased negative feedback that the authors test in D2R knockout flies where appetitive memory is diminished.
This dual autoreceptor system is an attractive model for how amplitude and kinetics of dopamine release can be fine-tuned and controlled depending on the cellular function and this paper presents a good methodology to do it and a good system where the dynamics of dopamine release can be tested at the level of behavior.
Weaknesses:
LI measurements of Kenyon cells and lobes indicate that Dop2R was approximately twice as enriched in the lobe as the average density across the whole neuron, while the lobe enrichment of Dop1R1 was about 1.5 times the average, are these levels consistent during different times of the day and the state of the animal. How were these conditions controlled and how sensitive are receptor expression to the time of day of dissection, staining, etc.
The authors assume without discussion as to why and how presynaptic enrichment of these receptors is similar in giant neurons and MB.
Figures 1-3 show the expensive expression of receptors in alpha and beta lobes while Figure 5 focusses on PAM and localization in γ and β' projections of PAM leading to the conclusion that pre-synaptic dopamine neurons express these and have feedback regulation. Consistency between lobes or discussion of these differences is important to consider.
Receptor expression in any learning-related MBONs is not discussed, and it would be intriguing as how receptors are organized in those cells. Given that these PAMs input to both KCs and MBONs these will have to work in some coordination.
Although authors use the D2R enhancement post starvation to show that knocking down receptors eliminated appetitive memory, the knocking out is affecting multiple neurons within this circuit including PAMs and KCs. How does that account for the observed effect? Are those not important for appetitive learning?
The evidence for fine-tuning is completely based on receptor expression and one behavioral outcome which could result from many possibilities. It is not clear if this fine-tuning and presynaptic feedback regulation-based dopamine release is a clear possibility. Alternate hypotheses and outcomes could be considered in the model as it is not completely substantiated by data at least as presented.
-
Reviewer #2 (Public Review):
Summary:
Hiramatsu et al. investigated how cognate neurotransmitter receptors with antagonizing downstream effects localize within neurons when co-expressed. They focus on mapping the localization of the dopaminergic Dop1R1 and Dop2R receptors, which correspond to the mammalian D1- and D2-like dopamine receptors, which have opposing effects on intracellular cAMP levels, in neurons of the Drosophila mushroom body (MB). To visualize specific receptors in single neuron types within the crowded MB neuropil, the authors use existing dopamine receptor alleles tagged with 7 copies of split GFP to target reconstitution of GFP tags only in the neurons of interest as a read-out of receptor localization. The authors show that both Dop1R1 and Dop2R, with differing degrees, are enriched in axonal compartments of both the Kenyon Cells cholinergic presynaptic inputs and in different dopamine neurons (DANs), which project axons to the MB. Co-localization studies of dopamine receptors with the presynaptic marker Brp suggest that Dop1R1 and, to a larger extent Dop2R, localize in the proximity of release sites. This localization pattern in DANs suggests that Dop1R1 and Dop2R work in dual-feedback regulation as autoreceptors. Finally, they provide evidence that the balance of Dop1R1 and Dop2R in the axons of two different DAN populations is differentially modulated by starvation and that this regulation plays a role in regulating appetitive behaviors.
Strengths:
The authors use reconstitution of GFP fluorescence of split GFP tags knocked into the endogenous locus at the C-terminus of the dopamine receptors as a readout of dopamine receptor localization. This elegant approach preserves the endogenous transcriptional and post-transcriptional regulation of the receptor, which is essential for studies of protein localization.
The study focuses on mapping the localization of dopamine receptors in neurons of the mushroom body. This is an excellent choice of system to address the question posed in this study, as the neurons are well-studied, and their connections are carefully reconstructed in the mushroom body connectome. Furthermore, the role of this circuit in different behaviors and associative memory permits the linking of patterns of receptor localization to circuit function and resulting behavior. Because of these features, the authors can provide evidence that two antagonizing dopamine receptors can act as autoreceptors within the axonal compartment of MB innervating DANs. The differential regulation of the balance of the two receptors under starvation in two distinct DAN innervations provides evidence of the role that regulation of this balance can play in circuit function and behavioral output.
Weaknesses:
The approach of using endogenously tagged alleles to study localization is a strength of this study, but the authors do not provide sufficient evidence that the insertion of 7 copies of split GFP to the C terminus of the dopamine receptors does not interfere with the endogenous localization pattern or function. Both sets of tagged alleles (1X Venus and 7X split GFP tagged) were previously reported (Kondo et al., 2020), but only the 1X Venus tagged alleles were further functionally validated in assays of olfactory appetitive memory. Despite the smaller size of the 7X split-GFP array tag knocked into the same location as the 1X venus tag, the reconstitution of 7 copies of GFP at the C terminus of the dopamine receptor, might substantially increase the molecular bulk at this site, potentially impeding the function of the receptor more significantly than the smaller, single Venus tag. The data presented by Kondo et al. 2020, is insufficient to conclude that the two alleles are equivalent.
The authors' conclusion that the receptors localize to presynaptic sites is weak. The analysis of the colocalization of the active zone marker Brp whole-brain staining with dopamine receptors labeled in specific neurons is insufficient to conclude that the receptors are localized at presynaptic sites. Given the highly crowded neuropil environment, the data cannot differentiate between the receptor localization postsynaptic to a dopamine release site or at a presynaptic site within the same neuron. The known distribution of presynaptic sites within the neurons analyzed in the study provides evidence that the receptors are enriched in axonal compartments, but co-labeling of presynaptic sites and receptors in the same neuron or super-resolution methods are needed to provide evidence of receptor localization at active zones. The data presented in Figures 5K-5L provides compelling evidence that the receptors localize to neuronal varicosities in DANs where the receptors could play a role as autoreceptors.
Given the highly crowded environment of the mushroom body neuropil, the analysis of dopamine receptor localization in Kenyon cells is not conclusive. The data is sufficient to conclude that the receptors are preferentially localizing to the axonal compartment of Kenyon cells, but co-localization with brain-wide Brp active zone immunostaining is not sufficient to determine if the receptor localizes juxtaposed to dopaminergic release sites, in proximity of release sites in Kenyon cells, or both.
-
-
www.medrxiv.org www.medrxiv.org
-
eLife assessment
This important work, leveraging state-of-the-art whole-night sleep EEG-fMRI methods, advances our understanding of the brain states underlying sleep and wakefulness. Despite a small sample size, the authors present convincing evidence for substates within N2 and REM sleep stages, with reliable transition structure, supporting the perspective that there are more than the five canonical sleep/wake states.
-
Reviewer #1 (Public Review):
Summary:
The study made fundamental findings in investigations of the dynamic functional states during sleep. Twenty-one HMM states were revealed from the fMRI data, surpassing the number of EEG-defined sleep stages, which can define sub-states of N2 and REM. Importantly, these findings were reproducible over two nights, shedding new light on the dynamics of brain function during sleep.
Strengths:
The study provides the most compelling evidence on the sub-states of both REM and N2 sleep. Moreover, they showed these findings on dynamics states and their transitions were reproducible over two nights of sleep. These novel findings offered unique information in the field of sleep neuroimaging.
Weaknesses:
The only weakness of this study has been acknowledged by the authors: limited sample size.
-
Reviewer #2 (Public Review):
Summary:
Yang and colleagues used a Hidden Markov Model (HMM) on whole-night fMRI to isolate sleep and wake brain states in a data-driven fashion. They identify more brain states (21) than the five sleep/wake stages described in conventional PSG-based sleep staging, show that the identified brain states are stable across nights, and characterize the brain states in terms of which networks they primarily engage.
Strengths:
This work's primary strengths are its dataset of two nights of whole-night concurrent EEG-fMRI (including REM sleep), and its sound methodology.
Weaknesses:
The study's weaknesses are its small sample size and the limited attempts at relating the identified fMRI brain states back to EEG.
General appraisal:
The paper's conclusions are generally well-supported, but some additional analyses and discussions could improve the work.
The authors' main focus lies in identifying fMRI-based brain states, and they succeed at demonstrating both the presence and robustness of these states in terms of cross-night stability. Additional characterization of brain states in terms of which networks these brain states primarily engage adds additional insights.
A somewhat missed opportunity is the absence of more analyses relating the HMM states back to EEG. It would be very helpful to the sleep field to see how EEG spectra of, say, different N2-related HMM states compare. Similarly, it is presently unclear whether anything noticeable happens within the EEG time course at the moment of an HMM class switch (particularly when the PSG stage remains stable). While the authors did look at slow wave density and various physiological signals in different HMM states, a characterization of the EEG itself in terms of spectral features is missing. Such analyses might have shown that fMRI-based brain states map onto familiar EEG substates, or reveal novel EEG changes that have so far gone unnoticed.
It is unclear how the presently identified HMM brain states relate to the previously identified NREM and wake states by Stevner et al. (2019), who used a roughly similar approach. This is important, as similar brain states across studies would suggest reproducibility, whereas large discrepancies could indicate a large dependence on particular methods and/or the sample (also see later point regarding generalizability).
More justice could be done to previous EEG-based efforts moving beyond conventional AASM-defined sleep/wake states. Various EEG studies performed data-driven clustering of brain states, typically indicating more than 5 traditional brain states (e.g., Koch et al. 2014, Christensen et al. 2019, Decat. et al 2022). Beyond that, countless subdivisions of classical sleep stages have been proposed (e.g., phasic/tonic REM, N2 with/without spindles, N3 with global/local slow waves, cyclic alternating patterns, and many more). While these aren't incorporated into standard sleep stage classification, the current manuscript could be misinterpreted to suggest that improved/data-driven classifications cannot be achieved from EEG, which is incorrect.
More discussion of the limitations of the current sample and generalizability would be helpful. A sample of N=12 is no doubt impressive for two nights of concurrent whole-night EEG-fMRI. Still, any data-driven approach can only capture the brain states that are present in the sample, and 12 individuals are unlikely to express all brain states present in the population of young healthy individuals. Add to that all the potentially different or altered brain states that come with healthy ageing, other demographic variables, and numerous clinical disorders. How do the authors expect their results to change with larger samples and/or varying these factors? Perhaps most importantly, I think it's important to mention that the particular number of identified brain states (here 21, and e.g. 19 in Stevner) is not set in stone and will likely vary as a function of many sample- and methods-related factors.
-