10,000 Matching Annotations
  1. Nov 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their overall positive evaluation of the manuscript and finding MChIP-C to be a valuable technological advance. To address the reviewer’s helpful comments and recommendations, we performed several additional analyses and improved the text and figures.

      Briefly, we extended and clarified the main text and methods, added analyses of interactions at consensus and method-specific CTCF/DHS sites (Figure S3), added additional comparison tracks to other methods in specific loci (Figure 4), added examples of MChIP-C E-P interactions at previously-verified loci (Figure S2a) and added extensive MChIP-C downsampling analysis (Figure S6).

      Recommendations for authors:

      Reviewer #2 (Recommendations For The Authors:

      (1) Provide .HiC and .cool files for the community to explore the data.

      We thank the reviewer for this suggestion. We have uploaded both the raw and processed data to GEO. We note that .cool and .hic formats may be less useful for this type of data, since it includes only promoter-based interactions and thus the resulting interaction matrix is extremely sparse at the relevant resolutions. In addition, we provide an online genomic browser for our data.

      (2) Provide an R or bioconda package for future data processing.

      We thank the reviewer for this suggestion. We have organized and streamlined the relevant code for processing MChIP-C data and it is available as a github repository.

      (3) The authors should avoid using "mln" for "million".

      We thank the reviewer for this suggestion. We have corrected this in the text.

      Reviewer #3 (Recommendations For The Authors):

      (1) Figure 2- A handful of sites identified by MChIP-C should be verified by 3C or 4C to validate they are true interactions using an orthogonal approach.

      We thank the reviewer for this suggestion. As we show in the current manuscript (and supported by several papers using MNase-based C-methods), C-methods based on restriction enzymes are considerably less sensitive than those based on MNase, so using these methods for anecdotal validation may not be adequate. In addition, it is difficult to extract accurate quantitative measurements from 3C and 4C due to challenges in bias normalization. As a large-scale alternative, we analyzed a set of consensus promoter-CTCF and promoter-DHS interactions identified by all 3 methods (PLAC-seq/Micro-C/MChIP-C; Figure S3). We find that MChIP-C shows clearly superior resolution and sensitivity on these consensus sites. In fact, even for sites which were only called by one of the competing methods, we still see better signal in the MChIP-C data (suggesting that our simplistic MChIP-C peak-calling approach could be improved for further gain). However, as this analysis focuses on “easily detectable” consensus sites, we also emphasize the importance of inspecting interactions which are not detected clearly by alternative methods. To this end, we now show in our manuscript interaction profiles for 11 loci (MYC, PTGER3, CITED2, BTG1, ANTXR2, SEMA7A, LMO2, GATA1, HBG2, VEGFA, MYB), each showing high-resolution MChIP-C interactions which coincide with expected genomic features (p300, CTCF, H3K27ac, known enhancers) and are not clearly observable in Micro-C and PLAC-seq. We also note that the extended overlap of detected MChIP-C interactions with functionally validated enhancers (as measured by CRISPRi) provides an additional large-scale orthogonal validation.

      (2) A supplemental table indicating read pair depth, etc, similar to S02, should be added for the datasets used for comparison (HiChIP-etc). Given the age differences between some of the reference data used, it may represent simply an improvement by increasing sequencing depth rather than a true technical advantage.

      We thank the reviewer for this suggestion. We have added the sequencing depths of the relevant datasets in the methods section. We also performed extensive downsampling analyses as explained in response to the next point.

      (3) I would recommend performing a downsampling analysis to determine at what point the MChIP-C data reaches saturation in terms of the number of reads, with a comparison to the HiChIP reference data. This would allow a more objective measure of the sensitivity of the assays with reference to read depth.

      We thank the reviewer for this suggestion. First, we note that downsampling does not affect the high sensitivity and resolution results as shown in aggregate plots (e.g. Figure 2 and Figure S3). However, downsampling can affect individual peak calling. We thus downsampled our data to 50%, approximately matching the number of total informative reads of both PLAC-seq and Micro-C (i.e. ~20M). We also further downsampled our data to 25% and 10%. With respect to prediction of K562 functionally validated enhancer-promoter interactions (Figure S6b), even at 25% downsampling MChIP-C achieves both a higher recall and higher precision than the other methods, with a slightly higher false-positive rate. At 10% sampling, recall is slightly worse than Micro-C and PLAC-seq, but both the precision and false-positive rate are better than the alternatives. With respect to saturation, we plotted the number of unique distal cis read pairs versus the total number of reads (Figure S6c), and find that our MChIP-C data does not yet show saturation. We also show that downsampling our data to 50% maintains  ~80% of the called interactions (Figure S6d).

      (4) "our results suggest that MChIP-C achieves superior sensitivity and resolution compared to C-methods based on standard restriction enzymes." The sensitivity claims are supported by Figure 2, but not the resolution claims. This is particularly challenging when using histone marks since they can be broad. To directly compare the resolution of MChIP-C to other approaches such as ChIA-PET or HiChIP CTCF or a similar DNA binding protein is required.

      We thank the reviewer for this suggestion. We first note that actually both sensitivity and resolution are relevant for the results shown in Figure 2 and for the signal-to-noise calculations. This is because the low resolution of PLAC-seq peaks can result in very broad peaks that cover the entire area of the interrogated window (5kb on each side), which could seem like low sensitivity. However, we believe that the new Figure S3 may show the higher resolution of MChIP-C more clearly, as do the 11 locus interaction profiles tracks shown in Figure 2, Figure 4 and Figure S2.

      Public reviews:

      Reviewer #1:

      The authors presented a new MNase-based proximity ligation method called MChIP-C, allowing for the measurement of protein-mediated chromatin interactions at single-nucleosome resolution on a genome-wide scale. With improved resolution and sensitivity, they explored the spatial connectivity of active promoters and identified the potential candidates for establishing/maintaining E-P interactions. Finally, with published CRISPRi screens, they found that most functionally verified enhancers do physically interact with their cognate promoters, supporting the enhancer-promoter looping model.

      The study's experimental approach and findings are interesting. However, several issues need to be addressed.

      (1) The authors described that "the lack of interaction between experimentally-validated enhancers and their cognate promoters in some studies employing C-methods has raised doubts regarding the classical promoter-enhancer looping model", so it's intriguing to see whether the MChIP-C could indeed detect the E-P interactions which were not identified by C-methods as they mentioned (Benabdallah et al., 2019; Gupta et al., 2017). I agree that they identified more E-P interactions using MChIP-C, but specifically, they should show at least 2-3 cases. It's important since this is the main conclusion the authors want to draw.

      We thank the reviewer for this suggestion. As we show in the current manuscript (and supported by several papers using MNase-based C-methods), C-methods based on restriction enzymes are considerably less sensitive than those based on MNase, so using these methods for anecdotal validation may not be useful. In addition, it is difficult to extract accurate quantitative measurements from 3C and 4C due to challenges in bias normalization. As a large-scale alternative, we analyzed a set of consensus promoter-CTCF and promoter-DHS interactions identified by all 3 methods (PLAC-seq/Micro-C/MChIP-C; new Figure S3). We find that MChIP-C shows clearly superior resolution and sensitivity on these consensus sites. However, as this analysis focuses on “easily detectable” consensus sites, we also emphasize the importance of inspecting interactions which are not detected clearly by alternative methods. To this end, we now show in our manuscript interaction profiles for 11 loci (MYC, PTGER3, CITED2, BTG1, ANTXR2, SEMA7A, LMO2, GATA1, HBG2, VEGFA, MYB), each showing high-resolution MChIP-C interactions which coincide with expected genomic features (p300, CTCF, H3K27ac, known enhancers) and are not clearly observable in Micro-C and PLAC-seq. We also note that the extended overlap of detected MChIP-C interactions with functionally validated enhancers (as measured by CRISPRi) provides an additional large-scale orthogonal validation.

      (2) The authors compared their data to those of Chen et al. (Chen et al., 2022), who used PLAC-seq with anti-H3K4me3 antibodies in K562 cells and standard Micro-C data previously reported for K562, concluding that "MChIP-C achieves superior sensitivity and resolution compared to C-methods based on standard restriction enzymes.". This is not convincing since they only compared their data to one dataset. More datasets from other cell lines should be included.

      We thank the reviewer for this suggestion. We would like to clarify that all datasets in the paper are K562 datasets, and this cell line is unique in the availability of CRISPRi screens, PLAC-Seq, Micro-C, and hundreds of ChIP-Seq tracks for it. We would expect datasets from other cell types to have changes in their regulatory interactions, so they would be less adequate for direct comparison. In addition, the general resolution and sensitivity limitations (e.g. due to restriction fragment size) are not dependent on cell type and has been shown in other MNase-based method papers.

      (3) The reasons for choosing Chen's data (Chen et al., 2022) and CRISPRi screens (Fulco et al., 2019; Gasperini et al., 2019) should be provided since there are so many out there.

      We thank the reviewer for this comment. We selected these CRISPRi screen datasets since they match the cell type (K562) which we used for MChIP-C, and we selected the PLAC-seq data as it is the only PLAC-seq/HiChIP dataset which matches both the cell type (K562) and the antibody (H3K4me3).

      (4) The authors identify EP300 histone acetyltransferase and the SWI/SNF remodeling complex as potential candidates for establishing and/or maintaining enhancer-promoter interactions, but not RNA polymerase II, mediator complex, YY1, and BRD4. More explanation is needed for this point since they're previously suggested to be associated with E-P interactions.

      We thank the reviewer for this comment. We apologize for this point being unclear: as Figure S5 shows, we actually did identify Pol2, mediator YY1 and BRD4 as predictive features, but P300 and SWI/SNF show somewhat higher predictive power. We have now clarified this in the text.

      (5) The limitations of the method should be discussed.

      We thank the reviewer for this suggestion. We have now added to the text a discussion of what we view as the current main limitation of the method, namely its low fraction of informative reads.

      Reviewer #2:

      Summary:

      Golov et al performed the capture of MChIP-C using the H3K4me3 antibody. The new method significantly increases the resolution of Micro-C and can detect clear interactions which are not well described in the previous HiChIP/PLAC-seq method. Overall, the paper represents a significant technological advance that can be valuable to the 3D genomic field in the future.

      Strengths:

      (1) The authors established a novel method to profile the promoter center genomic interactions based on the Micro-C method. Such a method could be very useful to dissect the enhancer promoter interaction which has long been an issue for the popular HiC method.

      (2) With the MChIP-C method the authors are able to find new genomic interactions with promoter regions enriched in CTCF. The author has significantly increased the detection sensitivity of such methods as PLAC-seq, Micro-C, and HiChIP.

      (3) The authors identified a new type of interaction between the CTCF-less promoter and the CTCF binding site. This particular type of interaction could explain the CTCF's function in regulating gene transcription activity as observed in many studies. I personally think the second stripe model of P-CTCF interaction is more likely as this has been proposed for the super-enhancer stripe model before. The author should also discuss this part of the story more.

      Weaknesses:

      (1) The data presentation should include the contact heat map. The current data presentation makes it hard for the readers to have a comprehensive view of pair-wise interactions between promoters and the PIR. In particular, these maps may directly give answers to the proposed model of promoter-CTCF interactions by the authors in Figure 3a.

      We thank the reviewer for this suggestion. We note that since the data mainly includes promoter-based interactions, the resulting interaction matrix is extremely sparse at the relevant resolutions. Specifically with respect to promoter-CTCF interactions, without a good sampling of the entire interaction matrix it is difficult to confidently distinguish between the two models only based on MChIP-C data, as it would require data about interaction between non-promoter regions and CTCF.

      (2) In Fig 3D, there seems a very limited increase of power predicting MChIP-C signal for DHS-promoter pairs beyond the addition of CTCF. This figure could be simplified with fewer factors.

      We thank the reviewer for this suggestion. We agree that the last factors do not add predictive power, but we do not think this overly complicates the figure and we prefer to leave these for the reader to evaluate.

      (3) The current method seems to have a big fraction of unusable reads. How the authors process the data should be included to allow for future reproduction. Ideally, the authors should generate a package on R or Bioconda for this processing.

      We thank the reviewer for this suggestion. We agree that the fraction of informative reads is small with respect to some other methods, and expect future versions of MChIP-C to address this limitation. We have organized and streamlined the relevant code for processing MChIP-C data and it is available as a github repository.

      Reviewer #3:

      Summary:

      This manuscript represents a technological development- specifically a micrococcal nuclease chromatin capture approach, termed MChIP-C to identify promoter-centered chromatin interactions at single nucleosome resolution via a specific protein, similar to HiChIP, ChIA-PET, etc.. In general, the manuscript is technically well done. Two major issues raise concerns that need to be addressed. First, it does not appear that novel chromatin interactions identified by MChIP-C which were missed by other approaches such as HiChIP, were validated. This is central to the argument of "improved" sensitivity, which is one of the key factors to assess sensitivity. Second is the question of resolution. Because the authors focus on a histone mark (H3K4me3) it is unclear whether the resolution of the assay truly exceeds other approaches, especially microC. These two issues are not completely supported by the data provided.

      Strengths:

      The method appears to hold promise to improve both the sensitivity and resolution of protein-centered chromatin capture approaches.

      Weaknesses:

      (1) Specific validation experiments to demonstrate the identification of previously missed novel interactions are missing.

      We thank the reviewer for this suggestion. Given that such interactions are missed by Micro-C and PLAC-seq, it would not make sense to use these methods for validation. We thus propose that MChIP-C interactions can be validated by their overlap with expected genomic features. To this end, we now show in our manuscript interaction profiles for 11 loci (MYC, PTGER3, CITED2, BTG1, ANTXR2, SEMA7A, LMO2, GATA1, HBG2, VEGFA, MYB), each showing high-resolution MChIP-C interactions which coincide with expected genomic features (p300, CTCF, H3K27ac, known enhancers) and are not clearly observable in Micro-C and PLAC-seq. In addition, the higher overlap of MChIP-C interactions with functionally-validated K562 enhancer-promoter interactions (provided by CRISPRi screens) provides further functional validation for novel MChIP-C interactions.

      (2) It is unclear if the resolution is really superior based on the data provided.

      We thank the reviewer for this comment. We first note that actually both sensitivity and resolution are relevant for the results shown in Figure 2 and for the signal-to-noise calculations. This is because the low resolution of PLAC-seq peaks can result in very broad peaks that cover the entire area of the interrogated window (5kb on each side), which could seem like low sensitivity. However, we believe that the new Figure S3 may show the higher resolution of MChIP-C more clearly, as do the 11 locus interaction profiles tracks shown in Figure 2, Figure 4 and Figure S2.

      (3) It is unclear how much advantage the approach has, especially compared to existing approaches such as HiChIP since sequencing depth as a variable is not adequately addressed.

      We thank the reviewer for this comment. First, we note that downsampling does not affect the high sensitivity and resolution results as shown in aggregate plots (e.g. Figure 2 and Figure S3). However, downsampling can affect individual peak calling. We thus downsampled our data to 50%, approximately matching the number of total informative reads of both PLAC-seq and Micro-C (i.e. ~20M). We also further downsampled our data to 25% and 10%. With respect to prediction of K562 functionally validated enhancer-promoter interactions (Figure S6b), even at 25% downsampling MChIP-C achieves both a higher recall and higher precision than the other methods, with a slightly higher false-positive rate. At 10% sampling, recall is slightly worse than Micro-C but both the precision and false-positive rate are better than the alternatives.

    1. Reviewer #3 (Public review):

      Summary:

      Krwawicz et al., present evidence that expression of DNMTs in E. coli results in (1) introduction of alkylation damage that is repaired by AlkB; (2) confers hypersensitivity to alkylating agents such as MMS (and exacerbated by loss of AlkB); (3) confers hypersensitivity to oxidative stress (H2O2 exposure); (4) results in a modest increase in ROS in the absence of exogenous H2O2 exposure; and (5) results in the production of oxidation products of 5mC, namely 5hmC and 5fC, leading to cellular toxicity. The findings reported here have interesting implications for the concept that such genotoxic and potentially mutagenic consequences of DNMT expression (resulting in 5mC) could be selectively disadvantageous for certain organisms. The other aspect of this work which is important for understanding the biological endpoints of genotoxic stress is the notion that DNA damage per se somehow induces elevated levels of ROS.

      Strengths:

      The manuscript is well-written, and the experiments have been carefully executed providing data that support the authors' proposed model presented in Fig. 7 (Discussion, sources of DNA damage due to DNMT expression).

      Weaknesses:

      (1) The authors have established an informative system relying on expression of DNMTs to gauge the effects of such expression and subsequent induction of 3mC and 5mC on cell survival and sensitivity to an alkylating agent (MMS) and exogenous oxidative stress (H2O2 exposure). The authors state (p4) that Fig. 2 shows that "Cells expressing either M.SssI or M.MpeI showed increased sensitivity to MMS treatment compared to WT C2523, supporting the conclusion that the expression of DNMTs increased the levels of alkylation damage." This is a confusing statement and requires revision as Fig. 2 does ALL cells shown in Fig. 2 are expressing DNMTs and have been treated with MMS. It is the absence of AlkB and the expression of DNMTs that that causes the MMS sensitivity.

      (2) It would be important to know whether the increased sensitivity (toxicity) to DNMT expression and MMS is also accompanied by substantial increases in mutagenicity. The authors should explain in the text why mutation frequencies were not also measured in these experiments.

      (3) Materials and Methods. ROS production monitoring. The "Total Reactive Oxygen Species (ROS) Assay Kit" has not been adequately described. Who is the Vendor? What is the nature of the ROS probes employed in this assay? Which specific ROS correspond to "total ROS"?

      (4) The demonstration (Fig. 4) that DNMT expression results in elevated ROS and its further synergistic increase when cells are also exposed to H2O2 is the basis for the authors' discussion of DNA damage-induced increases in cellular ROS. S. cerevisiae does not possess DNMTs/5mC, yet exposure to MMS also results in substantial increases in intracellular ROS (Rowe et al, (2008) Free Rad. Biol. Med. 45:1167-1177. PMC2643028). The authors should be aware of previous studies that have linked DNA damage to intracellular increases in ROS in other organisms and should comment on this in the text.

    2. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript proposes that 5mC modifications to DNA, despite being ancient and widespread throughout life, represent a vulnerability, making cells more susceptible to both chemical alkylation and, of more general importance, reactive oxygen species. Sarkies et al take the innovative approach of introducing enzymatic genome-wide cytosine methylation system (DNA methyltransferases, DNMTs) into E. coli, which normally lacks such a system. They provide compelling evidence that the introduction of DNMTs increases the sensitivity of E. coli to chemical alkylation damage. Surprisingly they also show DNMTs increase the sensitivity to reactive oxygen species and propose that the DNMT generated 5mC presents a target for the reactive oxygen species that is especially damaging to cells. Evidence is presented that DNMT activity directly or indirectly produces reactive oxygen species in vivo, which is an important discovery if correct, though the mechanism for this remains obscure.

      Strengths:

      This work is based on an interesting initial premise, it is well-motivated in the introduction and the manuscript is clearly written. The results themselves are compelling.

      We thank the reviewer for their positive response to our study.  We also really appreciate the thoughtful comments raised.  Adding the considerations raised below to the manuscript will considerably strengthen our findings.

      Weaknesses:

      I am not currently convinced by the principal interpretations and think that other explanations based on known phenomena could account for key results. Specific points below.

      (1) As noted in the manuscript, AlkB repairs alkylation damage by direct reversal (DNA strands are not cut). In the absence of AlkB, repair of alklylation damage/modification is likely through BER or other processes involving strand excision and resulting in single stranded DNA. It has previously been shown that 3mC modification from MMS exposure is highly specific to single stranded DNA (PMID:20663718) occurring at ~20,000 times the rate as double stranded DNA. Consequently, the introduction of DNMTs is expected to introduce many methylation adducts genome-wide that will generate single stranded DNA tracts when repaired in an AlkB deficient background (but not in an AlkB WT background), which are then hyper-susceptible to attack by MMS. Such ssDNA tracts are also vulnerable to generating double strand breaks, especially when they contain DNA polymerase stalling adducts such as 3mC. The generation of ssDNA during repair is similarly expected follow the H2O2 or TET based conversion of 5mC to 5hmC or 5fC neither of which can be directly repaired and depend on single strand excision for their removal. The potential importance of ssDNA generation in the experiments has not been considered.

      We thank the reviewer for this interesting and insightful suggestion.  Our interpretation of our findings is that a subset of MMS-induced DNA damage, specifically 3mC, overlaps with the damage introduced by DNMTs and this accounts for increased sensitivity to MMS when DNMTs are expressed.  However, the idea that the introduction of 3mC by DNMT actually makes the DNA more liable to damage by MMS, potentially through increasing the level of ssDNA, is also a potential explanation, which could operate in addition to the mechanism that we propose.

      (2) The authors emphasise the non-additivity of the MMS + DNMT + alkB experiment but the interpretation of the result is essentially an additive one: that both MMS and DNMT are introducing similar/same damage and AlkB acts to remove it. The non-additivity noted would seem to be more consistent with the ssDNA model proposed in #1. More generally non-additivity would also be seen if the survival to DNA methylation rate is non-linear over the range of the experiment, for example if there is a threshold effect where some repair process is overwhelmed. The linearity of MMS (and H2O2) exposure to survival could be directly tested with a dilution series of MMS (H2O2).

      We thank the reviewer for this point.  As in the response to point #1, the reviewer’s hypothesis of increased potency of MMS, potentially through increased ssDNA, downstream of 3mC induction by DNMT, is a good one.  The reviewers’ suggestion would produce a highly non-linear response to MMS treatment in the AlkB mutant in the DNMT background, so we agree that investigating non-linearity over a wider range rather than inferring from the non-additivity of a single point would be useful in evaluating the results so we will add a dose-response curve for DNMT-expressing cells to MMS to the revised version of the manuscript.

      (3) The substantial transcriptional changes induced by DNMT expression (Supplemental Figure 4) are a cause for concern and highlight that the ectopic introduction of methylation into a complex system is potentially more confounded than it may at first seem. Though the expression analysis shows bulk transcription properties, my concern is that the disruptive influence of methylation in a system not evolved with it adds not just consistent transcriptional changes but transcriptional heterogeneity between cells which could influence net survival in a stressed environment. In practice I don't think this can be controlled for, possibly quantified by single-cell RNA-seq but that is beyond the reasonable scope of this paper.

      We fully agree with the reviewer and, indeed, we are very interested in what is driving the transcriptional changes that we observed.  Work is currently underway in the lab to investigate this further but, as the reviewer suggests, is beyond the scope of this paper.  However, we will include a more extensive comment about the transcriptional changes in the discussion of the revised manuscript.

      (4) Figure 4 represents a striking result. From its current presentation it could be inferred that DNMTs are actively promoting ROS generation from H2O2 and also to a lesser extent in the absence of exogenous H2O2. That would be very surprising and a major finding with far-reaching implications. It would need to be further validated, for example by in vitro reconstitution of the reaction and monitoring ROS production. Rather, I think the authors are proposing that some currently undefined, indirect consequence of DNMT activity promotes ROS generation, especially when exogenous H2O2 is available. It would help if this were clarified.

      We thank the reviewer for picking this up.  In the current version’s discussion, we raised two possible explanations for why DNMT (even without H2O2) increases the ROS levels.  One idea is direct activity of DNMT, and one is through the product of DNMT activity acting as a platform to generate more ROS from endogenous or exogenous sources.  We argued that direct activity is less likely, exactly as the reviewer points out.  It is, however, not impossible and we agree with the reviewer that, if it were to be the case, it would be a striking result.  In the revised version of the manuscript we will include an experiment to test whether DNMTs can generate ROS in vitro, which may provide preliminary evidence to distinguish between the two hypotheses we raised, and we will also edit the text of the discussion to clarify our reasoning. 

      Reviewer #2 (Public review):

      5-methylcytosine (5mC) is a key epigenetic mark in DNA and plays a crucial role in regulating gene expression in many eukaryotes including humans. The DNA methyltransferases (DNMTs) that establish and maintain 5mC, are conserved in many species across eukaryotes, including animals, plants, and fungi, mainly in a CpG context. Interestingly, 5mC levels and distributions are quite variable across phylogenies with some species even appearing to have no such DNA methylation.

      This interesting and well-written paper discusses the continuation of some of the authors' work published several years ago. In that previous paper, the laboratory demonstrated that DNA methylation pathways coevolved with DNA repair mechanisms, specifically with the alkylation repair system. Specifically, they discovered that DNMTs can introduce alkylation damage into DNA, specifically in the form of 3-methylcytosine (3mC). (This appears to be an error in the DNMT enzymatic mechanism where the generation 3mC as opposed to its preferred product 5-methylcytosine (5mC), is caused by the flipped target cytosine binding to the active site pocket of the DNMT in an inverted orientation.) The presence of 3mC is potentially toxic and can cause replication stress, which this paper suggests may explain the loss of DNA methylation in different species. They further showed that the ALKB2 enzyme plays a crucial role in repairing this alkylation damage, further emphasizing the link between DNA methylation and DNA repair.

      The co-evolution of DNMTs with DNA repair mechanisms suggests there can be distinct advantages and disadvantages of DNA methylation to different species which might depend on their environmental niche. In environments that expose species to high levels of DNA damage, high levels of 5mC in their genome may be disadvantageous. This present paper sets out to examine the sensitivity of an organism to genotoxic stresses such as alkylation and oxidation agents as the consequence of DNMT activity. Since such a study in eukaryotes would be complicated by DNA methylation controlling gene regulation, these authors cleverly utilize Escherichia coli (E.coli) and incorporate into it the DNMTs from other bacteria that methylate the cytosines of DNA in a CpG context like that observed in eukaryotes; the active sites of these enzymes are very similar to eukaryotic DNMTs and basically utilize the same catalytic mechanism (also this strain of E.coli does not specifically degrade this methylated DNA) .

      The experiments in this paper more than adequately show that E. coli expression of these DNMTs (comparing to the same strain without the DNMTS) do indeed show increased sensitivity to alkylating agents and this sensitivity was even greater than expected when a DNA repair mechanism was inactivated. Moreover, they show that this E. coli expressing this DNMT is more sensitive to oxidizing agents such as H2O2 and has exacerbated sensitivity when a DNA repair glycosylase is inactivated. Both propensities suggest that DNMT activity itself may generate additional genotoxic stress. Intrigued that DNMT expression itself might induce sensitivity to oxidative stress, the experimenters used a fluorescent sensor to show that H2O2 induced reactive oxygen species (ROS) are markedly enhanced with DNMT expression. Importantly, they show that DNMT expression alone gave rise to increased ROS amounts and both H2O2 addition and DNMT expression has greater effect that the linear combination of the two separately. They also carefully checked that the increased sensitivity to H2O2 was not potentially caused by some effect on gene expression of detoxification genes by DNMT expression and activity. Finally, by using mass spectroscopy, they show that DNMT expression led to production of the 5mC oxidation derivatives 5-hydroxymethylcytosine (5hmC) and 5-formylcytosine (5fC) in DNA. 5fC is a substrate for base excision repair while 5hmC is not; more 5fC was observed. Introduction of non-bacterial enzymes that produce 5hmC and 5fC into the DNMT expressing bacteria again showed a greater sensitivity than expected. Remarkedly, in their assay with addition of H2O2, bacteria showed no growth with this dual expression of DNMT and these enzymes.

      Overall, the authors conduct well thought-out and simple experiments to show that a disadvantageous consequence of DNMT expression leading to 5mC in DNA is increased sensitivity to oxidative stress as well as alkylating agents.

      Again, the paper is well-written and organized. The hypotheses are well-examined by simple experiments. The results are interesting and can impact many scientific areas such as our understanding of evolutionary pressures on an organism by environment to impacting our understanding about how environment of a malignant cell in the human body may lead to cancer.

      We thank the reviewer for their response to our study, and value the time taken to produce a public review that will aid readers in understanding the key results of our study. 

      Reviewer #3 (Public review):

      Summary:

      Krwawicz et al., present evidence that expression of DNMTs in E. coli results in (1) introduction of alkylation damage that is repaired by AlkB; (2) confers hypersensitivity to alkylating agents such as MMS (and exacerbated by loss of AlkB); (3) confers hypersensitivity to oxidative stress (H2O2 exposure); (4) results in a modest increase in ROS in the absence of exogenous H2O2 exposure; and (5) results in the production of oxidation products of 5mC, namely 5hmC and 5fC, leading to cellular toxicity. The findings reported here have interesting implications for the concept that such genotoxic and potentially mutagenic consequences of DNMT expression (resulting in 5mC) could be selectively disadvantageous for certain organisms. The other aspect of this work which is important for understanding the biological endpoints of genotoxic stress is the notion that DNA damage per se somehow induces elevated levels of ROS.

      Strengths:

      The manuscript is well-written, and the experiments have been carefully executed providing data that support the authors' proposed model presented in Fig. 7 (Discussion, sources of DNA damage due to DNMT expression).

      Weaknesses:

      (1) The authors have established an informative system relying on expression of DNMTs to gauge the effects of such expression and subsequent induction of 3mC and 5mC on cell survival and sensitivity to an alkylating agent (MMS) and exogenous oxidative stress (H2O2 exposure). The authors state (p4) that Fig. 2 shows that "Cells expressing either M.SssI or M.MpeI showed increased sensitivity to MMS treatment compared to WT C2523, supporting the conclusion that the expression of DNMTs increased the levels of alkylation damage." This is a confusing statement and requires revision as Fig. 2 does ALL cells shown in Fig. 2 are expressing DNMTs and have been treated with MMS. It is the absence of AlkB and the expression of DNMTs that that causes the MMS sensitivity.

      We thank the reviewer for this and agree that this needs to be clarified with regards to the figure presented and will do so in the revised manuscript. 

      (2) It would be important to know whether the increased sensitivity (toxicity) to DNMT expression and MMS is also accompanied by substantial increases in mutagenicity. The authors should explain in the text why mutation frequencies were not also measured in these experiments.

      This is an important point because it is not immediately obvious that increased sensitivity would be associated with increased mutagenicity (if, for example, 3mC was never a cause of innacurate DNA repair even in the absence of AlkB).  We will carry out this experiment and include these data in the revised version of the manuscript.  Detailed consideration of the types and sources of mutations is beyond the scope of this manuscript, but we are also working on this and hope to produce data on this in the future. 

      (3) Materials and Methods. ROS production monitoring. The "Total Reactive Oxygen Species (ROS) Assay Kit" has not been adequately described. Who is the Vendor? What is the nature of the ROS probes employed in this assay? Which specific ROS correspond to "total ROS"?

      The ROS measurement was with a kit from ThermoFisher: https://www.thermofisher.com/order/catalog/product/88-5930-74.  The probe is DCFH-DA.  This is a general ROS sensor that is oxidised by a large number of cellular reactive oxygen species hence we cannot attribute the signal to a single species.  Use of a technique with the potential to more precisely identify the species involved is something we plan to do in future, but is beyond what we can do as part of this study.  We will include a comment to this effect in the revised version of the manuscript.

      (4) The demonstration (Fig. 4) that DNMT expression results in elevated ROS and its further synergistic increase when cells are also exposed to H2O2 is the basis for the authors' discussion of DNA damage-induced increases in cellular ROS. S. cerevisiae does not possess DNMTs/5mC, yet exposure to MMS also results in substantial increases in intracellular ROS (Rowe et al, (2008) Free Rad. Biol. Med. 45:1167-1177. PMC2643028). The authors should be aware of previous studies that have linked DNA damage to intracellular increases in ROS in other organisms and should comment on this in the text.

      We thank the reviewer for this point.  We note that the increased ROS that we observed occur in the presence of DNMTs alone and in the presence of H2O2, not in the presence of MMS; however, the point that DNA damage in general can promote increased ROS in some circumstances is well taken and we will include a comment on this in the discussion of the revised version.

    3. eLife Assessment

      This important work advances our understanding of DNA methylation and its consequences for susceptibility to DNA damage. This work presents evidence that DNA methylation can accentuate the genomic damage propagated by DNA damaging agents as well as potentially being an independent source of such damage. The experimental results reported are sound but the evidence presented to support the conclusions drawn is incomplete and other interpretations are possible. The work will be of broad interest to biochemists, cell and genome biologists.

    4. Reviewer #1 (Public review):

      Summary:

      The manuscript proposes that 5mC modifications to DNA, despite being ancient and widespread throughout life, represent a vulnerability, making cells more susceptible to both chemical alkylation and, of more general importance, reactive oxygen species. Sarkies et al take the innovative approach of introducing enzymatic genome-wide cytosine methylation system (DNA methyltransferases, DNMTs) into E. coli, which normally lacks such a system. They provide compelling evidence that the introduction of DNMTs increases the sensitivity of E. coli to chemical alkylation damage. Surprisingly they also show DNMTs increase the sensitivity to reactive oxygen species and propose that the DNMT generated 5mC presents a target for the reactive oxygen species that is especially damaging to cells. Evidence is presented that DNMT activity directly or indirectly produces reactive oxygen species in vivo, which is an important discovery if correct, though the mechanism for this remains obscure.

      Strengths:

      This work is based on an interesting initial premise, it is well-motivated in the introduction and the manuscript is clearly written. The results themselves are compelling.

      Weaknesses:

      I am not currently convinced by the principal interpretations and think that other explanations based on known phenomena could account for key results. Specific points below.

      (1) As noted in the manuscript, AlkB repairs alkylation damage by direct reversal (DNA strands are not cut). In the absence of AlkB, repair of alklylation damage/modification is likely through BER or other processes involving strand excision and resulting in single stranded DNA. It has previously been shown that 3mC modification from MMS exposure is highly specific to single stranded DNA (PMID:20663718) occurring at ~20,000 times the rate as double stranded DNA. Consequently, the introduction of DNMTs is expected to introduce many methylation adducts genome-wide that will generate single stranded DNA tracts when repaired in an AlkB deficient background (but not in an AlkB WT background), which are then hyper-susceptible to attack by MMS. Such ssDNA tracts are also vulnerable to generating double strand breaks, especially when they contain DNA polymerase stalling adducts such as 3mC. The generation of ssDNA during repair is similarly expected follow the H2O2 or TET based conversion of 5mC to 5hmC or 5fC neither of which can be directly repaired and depend on single strand excision for their removal. The potential importance of ssDNA generation in the experiments has not been considered.

      (2) The authors emphasise the non-additivity of the MMS + DNMT + alkB experiment but the interpretation of the result is essentially an additive one: that both MMS and DNMT are introducing similar/same damage and AlkB acts to remove it. The non-additivity noted would seem to be more consistent with the ssDNA model proposed in #1. More generally non-additivity would also be seen if the survival to DNA methylation rate is non-linear over the range of the experiment, for example if there is a threshold effect where some repair process is overwhelmed. The linearity of MMS (and H2O2) exposure to survival could be directly tested with a dilution series of MMS (H2O2).

      (3) The substantial transcriptional changes induced by DNMT expression (Supplemental Figure 4) are a cause for concern and highlight that the ectopic introduction of methylation into a complex system is potentially more confounded than it may at first seem. Though the expression analysis shows bulk transcription properties, my concern is that the disruptive influence of methylation in a system not evolved with it adds not just consistent transcriptional changes but transcriptional heterogeneity between cells which could influence net survival in a stressed environment. In practice I don't think this can be controlled for, possibly quantified by single-cell RNA-seq but that is beyond the reasonable scope of this paper.

      (4) Figure 4 represents a striking result. From its current presentation it could be inferred that DNMTs are actively promoting ROS generation from H2O2 and also to a lesser extent in the absence of exogenous H2O2. That would be very surprising and a major finding with far-reaching implications. It would need to be further validated, for example by in vitro reconstitution of the reaction and monitoring ROS production. Rather, I think the authors are proposing that some currently undefined, indirect consequence of DNMT activity promotes ROS generation, especially when exogenous H2O2 is available. It would help if this were clarified.

    5. Reviewer #2 (Public review):

      5-methylcytosine (5mC) is a key epigenetic mark in DNA and plays a crucial role in regulating gene expression in many eukaryotes including humans. The DNA methyltransferases (DNMTs) that establish and maintain 5mC, are conserved in many species across eukaryotes, including animals, plants, and fungi, mainly in a CpG context. Interestingly, 5mC levels and distributions are quite variable across phylogenies with some species even appearing to have no such DNA methylation.

      This interesting and well-written paper discusses the continuation of some of the authors' work published several years ago. In that previous paper, the laboratory demonstrated that DNA methylation pathways coevolved with DNA repair mechanisms, specifically with the alkylation repair system. Specifically, they discovered that DNMTs can introduce alkylation damage into DNA, specifically in the form of 3-methylcytosine (3mC). (This appears to be an error in the DNMT enzymatic mechanism where the generation 3mC as opposed to its preferred product 5-methylcytosine (5mC), is caused by the flipped target cytosine binding to the active site pocket of the DNMT in an inverted orientation.) The presence of 3mC is potentially toxic and can cause replication stress, which this paper suggests may explain the loss of DNA methylation in different species. They further showed that the ALKB2 enzyme plays a crucial role in repairing this alkylation damage, further emphasizing the link between DNA methylation and DNA repair.

      The co-evolution of DNMTs with DNA repair mechanisms suggests there can be distinct advantages and disadvantages of DNA methylation to different species which might depend on their environmental niche. In environments that expose species to high levels of DNA damage, high levels of 5mC in their genome may be disadvantageous. This present paper sets out to examine the sensitivity of an organism to genotoxic stresses such as alkylation and oxidation agents as the consequence of DNMT activity. Since such a study in eukaryotes would be complicated by DNA methylation controlling gene regulation, these authors cleverly utilize Escherichia coli (E.coli) and incorporate into it the DNMTs from other bacteria that methylate the cytosines of DNA in a CpG context like that observed in eukaryotes; the active sites of these enzymes are very similar to eukaryotic DNMTs and basically utilize the same catalytic mechanism (also this strain of E.coli does not specifically degrade this methylated DNA) .

      The experiments in this paper more than adequately show that E. coli expression of these DNMTs (comparing to the same strain without the DNMTS) do indeed show increased sensitivity to alkylating agents and this sensitivity was even greater than expected when a DNA repair mechanism was inactivated. Moreover, they show that this E. coli expressing this DNMT is more sensitive to oxidizing agents such as H2O2 and has exacerbated sensitivity when a DNA repair glycosylase is inactivated. Both propensities suggest that DNMT activity itself may generate additional genotoxic stress. Intrigued that DNMT expression itself might induce sensitivity to oxidative stress, the experimenters used a fluorescent sensor to show that H2O2 induced reactive oxygen species (ROS) are markedly enhanced with DNMT expression. Importantly, they show that DNMT expression alone gave rise to increased ROS amounts and both H2O2 addition and DNMT expression has greater effect that the linear combination of the two separately. They also carefully checked that the increased sensitivity to H2O2 was not potentially caused by some effect on gene expression of detoxification genes by DNMT expression and activity. Finally, by using mass spectroscopy, they show that DNMT expression led to production of the 5mC oxidation derivatives 5-hydroxymethylcytosine (5hmC) and 5-formylcytosine (5fC) in DNA. 5fC is a substrate for base excision repair while 5hmC is not; more 5fC was observed. Introduction of non-bacterial enzymes that produce 5hmC and 5fC into the DNMT expressing bacteria again showed a greater sensitivity than expected. Remarkedly, in their assay with addition of H2O2, bacteria showed no growth with this dual expression of DNMT and these enzymes.

      Overall, the authors conduct well thought-out and simple experiments to show that a disadvantageous consequence of DNMT expression leading to 5mC in DNA is increased sensitivity to oxidative stress as well as alkylating agents.

      Again, the paper is well-written and organized. The hypotheses are well-examined by simple experiments. The results are interesting and can impact many scientific areas such as our understanding of evolutionary pressures on an organism by environment to impacting our understanding about how environment of a malignant cell in the human body may lead to cancer.

    1. eLife Assessment

      Ferredoxins are ubiquitous electron transfer proteins that drive essential metabolic processes across all domains of life. This fundamental contribution to the field provides the first description of how specific amino acids, though a series of hydrogen bonds, control the ability of iron-sulfur clusters in ferrodoxins to accept and donate electrons. The evidence supporting the conclusions is compelling as is the combined use of neutron crystallography with X-ray crystallography and classical spectral/redox studies.

    2. Reviewer #1 (Public review):

      Summary:

      The authors introduced neutron crystallography coupled with room temperature X-ray crystallography to exam the redox properties of the BtFt [4Fe-4S] cluster expressed in E. coli. Neutron structure allowed the authors to exam the influence of Asp64 on the redox properties of the [4Fe-4S] cluster. The neutron structure also allowed for the identification of the hydrogen network around the [4Fe-4S] structure. This work was followed by density functional theory calculation to examine different redox states which also pointed to the role of Asp64 in affecting or dictating redox function of the [4Fe-4S] cluster. Based on the DFT work the authors examine the redox properties under oxic and anoxic conditions in wild type enzymes and in a D64N mutant again showing the role of Asp64 on the redox kinetics and redox potential of the [4Fe-4S] cluster. Lastly, the authors examined similar [4Fe-4S] ferredoxins from several organisms and with a Asp64 or Glu64 observed a similar role of Asp64 on the low potential state of the [4Fe-4S] cluster. The major conclusion of the study was to identify the role of specific amino acids, in this case Asp64, in controlling the redox state and kinetics of [4Fe-4S] clusters. The authors also demonstrate the strength of neutron crystallography when combined with classical X-ray crystallography and classical spectral/redox studies.

      Strengths:

      In general, the experimental design is logical and the results are convincing demonstrating the role of Asp64 on the redox properties of [4Fe-4S] clusters in ferredoxins.

      Weaknesses:

      The role(s) of coordinating amino acids on the redox properties of a functional group is not surprising, this reviewer believes this is a novel result in ferredoxins and does make a nice contribution to the field.

    3. Reviewer #2 (Public review):

      In this study, Wada et al. investigate the low potential ferredoxin from Bacillus thermoproteolyticus (BtFd) using a combination of neutron crystallography, x-ray crystallography, DFT and spectroscopy to determine the influence of hydrogen bonding networks on the redox potential of ferredoxin's 4Fe-4S cluster. The use of neutron diffraction allowed the authors to probe the precise location of hydrogens around the 4Fe-4S cluster, which was not possible from prior studies, even with the previously reported high-resolution (0.92 Å) structure of BtFd. This allowed the authors to revise prior models of the proposed H bonding network theorized from earlier x-ray crystallography studies ( for example, showing that there is not in fact a H bond formed between the Thr63-O𝛾1 and the [4Fe-4S]-S4 atoms). With this newly described H-bonding network established, the electronic structure of the 4Fe-4S cluster was then investigated using DFT methodology, revealing a startling role of the deprotonated surface residue Asp64, which bears substantial electronic density in the LUMO which is otherwise localized to the 4Fe-4S cluster. While aspartate is usually deprotonated at physiological pH, the authors provide compelling evidence that this aspartate has a much higher pKa than is usual, and is able to act as a protonation-dependent switch which controls the stability of the reduced state of the 4Fe-4S cluster, and thus the redox potential.

      The findings of this study and the conclusions drawn from them are well supported by the data and computational work. Their findings have implications for similar control mechanisms in other, non-ferredoxin 4Fe-4S bearing electron transport proteins which have yet to be explored, providing great value to the metalloprotein community. One change that the authors may consider to enhance the clarity of the manuscript regards the nomenclature used for the varying models discussed (CM, CMNA, CMH and so forth). It would be beneficial to the reader if the nomenclature included the redox state (ox. vs red.) of the model in the model's name.

    1. eLife Assessment

      This manuscript describes valuable new material of small, unusually preserved fossils from deep in the Cambrian of China and argues they represent very early bilaterian animals such as annelids or panarthropods. The authors present convincing evidence of the fossilisation of specimens as microbial pseudomorphs, however, the fossils show few details and it is difficult to assess their affinity. The broader claims made about the timing and nature of the Cambrian explosion are inadequately supported by the material, given that bilaterians were already known to exist during that period.

    2. Reviewer #1 (Public review):

      Summary:

      A description of small phosphatised fossils from the Kuanchuanpu, formations that are claimed to represent unequivocal early segmented bilaterians with limbs, ie annelids or panarthropods. All material from the Kuanchuanpu is of interest, and the mode of preservation is certainly striking.

      However, few details apart from bilateral symmetry and paired protrusions are present. In addition, fragments of potential progenitors such as anabaritiids cannot be entirely ruled out. In addition, the broader claims about the nature of the Cambrian explosion, the gap between the fossil record and molecular clocks, and what various authors have said about them are either inadequate or incorrect. For example, Budd and Jackson did not at all make the claim that the earliest bilaterians were soft-bodied and tiny. Glaessner (1958) is a very out-of-date reference to use. We know that bilaterians certainly existed by the time of Kuanchuanpo.

      Even so, it is possible that these fragments do represent internal moulds of taxa such as lobopod-like organisms, even if the evidence is not totally persuasive.

    3. Reviewer #2 (Public review):

      This manuscript by Yang et al. describes a variety of bilateral and segmented microfossils from the basal Cambrian (Fortunian Stage) Kuanchuanpu Formation, South China. During the Fortunian Stage, body fossils are scarce, and key evidence for the presence of different clades relies on exceptionally preserved microfossils of embryos and larvae. The authors interpret the described microfossils as segmented bilaterians, with anteroposterior and dorsoventral differentiation and paired appendages. The implication of this interpretation is that the microfossils represent important evidence for early bilaterian evolution.

      The strength of the manuscript is the convincing presentation of the material's bilateral and segmented nature and its taphonomy. The combined use of scanning electron microscopy and X-ray computed tomography to illustrate the material convincingly supports the argument of a bilaterian affinity. Likewise, the visualization of the cemented vesicles composed of phosphate nanocrystals that make up the fossils' internal molds supports the proposed taphonomic pathway.

      The weakness of the manuscript is the further biological interpretations. While the manuscript presents a convincing argument that the molds derive from overall segmented (metameric) body plans, it does not fully explore which cavities/organs are actually molded. Instead, it assumes without discussion that the molds reflect the cuticle with a loss of fine external structures (e.g., setae). While external sclerites and cuticles are convincingly displayed in one case (Figure Supplement 5), more options exist for the rest of the material. Here, molds could perhaps represent other cavities, such as guts (including diverticula) or perivisceral cavities, both consistent with a lack of fine external details as well as an endogenous taphonomic pathway. A proper exploration of what these molds actually represent is, therefore, crucial to interpreting the ecological and evolutionary implications of the fossils.

      Despite its weakness, the manuscript demonstrates convincing evidence of bilaterian microfossils in the Fortunian Stage. This evidence, in itself, contributes valuable information on the Cambrian animal radiation.

    1. eLife Assessment

      This important study employed multiple orthogonal techniques and tissue samples to investigate the interaction between the NRL transcription factor and RNA-binding proteins in the retina. The findings are solid to support an interaction between NRL and the DHX9 helicase. However, the evidence for an interaction between the NRL transcription factor and R-loops is less conclusive. The significance of the study could be enhanced by examining the functional role of NRL interactions with R-loops in the developing retina, which would offer new insights into the gene regulatory networks.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Corso-Diaz et al, focus on the NRL transcription factor (TF), which is critical for retinal rod photoreceptor development and function. The authors profile NRL's protein interactome, revealing several RNA-binding proteins (RBPs) among its components. Notably, many of these RBPs are associated with R-loop biology, including DHX9 helicase, which is the primary focus of this study. R-loops are three-stranded nucleic acid structures that frequently form during transcription. The authors demonstrate that R-loop levels increase during photoreceptor maturation and establish an interaction between NRL TF and DHX9 helicase. The association between NRL and RBPs like DHX9 suggests a cooperative regulation of gene expression in a cell-type-specific manner, an intriguing discovery relevant to photoreceptor health. Since DHX9 is a key regulator of R-loop homeostasis, the study proposes a potential mechanism where a cell-type-specific TF controls the expression of certain genes by modulating R-loop homeostasis. This study also presents the first data on R-loop mapping in mammalian retinas and shows the enrichment of R-loops over intergenic regions as well as genes encoding neuronal function factors. While the research topic is very important, there is some concern regarding the data presented: there are substantial data supporting the interaction between NRL and DHX9, including pull-down experiments and proximity labeling assay (PLA), however, the data showing an interaction between NRL and DDX5, another R-loop-associated helicase, are inadequate. Importantly, the data supporting the claim that NRL interacts with R-loops are absolutely insufficient and at best, correlative. The next concerns are regarding the R-loop mapping data analysis and visualization.

      Strengths:

      There is compelling evidence that the NRL transcription factor interacts with several RNA binding proteins, and specifically, sufficient data supporting the interaction of NRL with DHX9 helicase.<br /> A major strength is the use of the single-stranded R-loop mapping method in the mouse retina.

      Weaknesses:

      (1) Figure S1A: There is a strong band in GST-IP (control IP) for either HNRNPUI1 or HNRNPU, although the authors state in their results that there is a strong interaction of these two RBPs with NRL. Both DHX9 and DDX5 samples have a faint band in the GST-IP. There is an extremely faint band for HNRNPA2B1 in the GST-NRL IP lane. Given this is a pull-down with added benzonase treatment to remove all nucleic acids, these data suggest, that previously observed NRL interactions with these particular RBPs are mediated via nucleic acids. Similarly, there is a loss of band signal for HNRNM in this assay, although it was identified as an NRL-interacting protein in three assays, which again suggests that nucleic acids mediate the interaction.

      (2) The data supporting NRL-DDX5 interaction in rod photoreceptor nuclei is very weak. In Figure 2D, the PLA signal for DDX5-NRL is very weak in the adult mouse retina and is absent in the human retina, as shown in Figure 2H. Given that there is no NRL-KO available for the human PLA assay, the control experiments using single-protein antibodies should be included in the assay. Similarly, the single-protein antibody control PLA experiments should be included in the experimental data presented in Figure 2J.

      (3) The EMSA experiment using a probe containing NRL binding motif within the DHX9 promoter should include incubation with retina nuclear extracts depleted for NRL as a control.

      (4) There is a reduced amount of DHX9 pulled down in NRL-IP in HEK293 cells, but there is no statistically significant difference in the reciprocal IP (DHX9-IP and blotting for NRL) (Figure 4C).

      (5) The only data supporting the claim that NRL interacts with R-loops are presented in Figure 5A. This is a co-IP of R-loops and then blotting for NRL, DHX9, and DDX5. Here, there is no signal for DDX5, quantification of DHX9 signal shows no statistically significant difference between RNase H treated and untreated samples, while NRL shows a signal in RNase H treated sample. These data are not sufficient to make the statement regarding the interaction of NRL with R-loops.

      (6) Regarding R-loop mapping, the data analysis is quite confusing. The authors perform two different types of analyses: either overall narrow and broad peak analysis or strand-specific analysis. Given that the authors used ssDRIP-seq, which is a method designed to map R-loops strand specifically, it is confusing to perform different types of analyses. Next, the peak analysis is usually performed based on the RNase H treated R-loop mapping; what does it mean then to have a pool of "Not R-loops", see Figure 6B? In that regard, what does the term "unstranded" R-loops mean? Based on the authors' definition, these are R-loops that do not fall within the group of strand-specific R-loops. The authors should explain the reasons behind these types of analyses and explain, what the biological relevance of these different types of R-loops is.

      (7) It would be more useful to show the percent distribution of R-loops over the different genomic regions, instead of showing p-value enrichment, see Figure 6C.

      (8) Based on the model presented, NRL regulates R-loop biology via interaction with RBPs, such as DHX9, a known R-loop resolution helicase. Given that the gene targets of NRL TF are known, it would be useful to then analyze the R-loop mapping data across this gene set.

    3. Reviewer #2 (Public review):

      Summary:

      The authors utilize biochemical approaches to determine and validate NRL protein-protein interactions to further understand the mechanisms by which the NRL transcription factor controls rod photoreceptor gene regulatory networks. Observations that NRL displays numerous protein-protein interactions with RNA-binding proteins, many of which are involved in R-loop biology, led the authors to investigate the role of RNA and R-loops in mediating protein-protein interactions and profile the co-localization of R-loops with NRL genomic occupancy.

      Strengths:

      Overall, the manuscript is very well written, providing succinct explanations of the observed results and potential implications. Additionally, the authors use multiple orthogonal techniques and tissue samples to reproduce and validate that NRL interacts with DHX9 and DDX5. Experiments also utilize specific assays to understand the influence of RNA and R-loops on protein-protein interactions. The authors also use state-of-the-art techniques to profile R-loop localization within the retina and integrate multiple previously established datasets to correlate R-loop presence with transcription factor binding and chromatin marks in an attempt to understand the significance of R-loops in the retina.

      Weaknesses:

      In general, the authors provide superficial interpretations of the data that fit a narrative but fail to provide alternative explanations or address caveats of the results. Specifically, many bands are present in interaction studies either in control lanes (GST controls) of Westerns or large amounts of background in PLA experiments. Additionally, the lack of experiments testing the functional significance of Nrl interactions or R-loops within the developing retina fails to provide novel biological insights into the regulation of gene regulatory networks other than, 'This could be a potentially important new mechanism'. Additionally, the authors test the necessity of RNA for NRL/DHX9 interactions but don't show RNA binding of NRL or DHX9 or the sufficiency of RNA to interfere/mediate protein-protein interactions. Recent work has highlighted the prevalence of RNA binding by transcription factors through Arginine Rich Motifs that are located near the DNA binding domains of transcription factors.

    1. eLife Assessment

      This study reports an important new scRNAseq atlas of the mouse cranial neural plate during neural induction, patterning, and morphogenesis. The study includes a robust analysis of scRNAseq datasets covering six distinct developmental stages, as well as data describing the global transcriptional response of neural plate cells to a key ventralizing signaling molecule, Sonic Hedgehog. The computational data and validation of gene expression patterns are convincing, making this a helpful resource for investigators studying the early development of the cranial neural plate and cranial mesoderm.

    2. Reviewer #1 (Public review):

      Summary:

      This impressive study presents a comprehensive scRNAseq atlas of the cranial region during neural induction, patterning, and morphogenesis. The authors collected a robust scRNAseq dataset covering six distinct developmental stages. The analysis focused on the neural tissue, resulting in a highly detailed temporal map of neural plate development. The findings demonstrate how different cell fates are organized in specific spatial patterns along the anterior-posterior and medial-lateral axes within the developing neural tissue. Additionally, the research utilized high-density single-cell RNA sequencing (scRNAseq) to reveal intricate spatial and temporal patterns independent of traditional spatial techniques.

      The investigation utilized diffusion component analysis to spatially order cells based on their positioning along the anterior-posterior axis, corresponding to the forebrain, midbrain, hindbrain, and medial-lateral axis. By cross-referencing with MGI expression data, the identification of cell types was validated, affirming the expression patterns of numerous known genes and implicating others as differentially expressed along these axes. These findings significantly advance our understanding of the spatially regulated genes in neural tissues during early developmental stages. The emphasis on transcription factors, cell surface, and secreted proteins provides valuable insights into the intricate gene regulatory networks underpinning neural tissue patterning. Analysis of a second scRNAseq dataset where Shh signaling was inhibited by culturing embryos in SAG identified known and previously unknown transcripts regulated by Shh, including the Wnt pathway.

      The data includes the neural plate and captures all major cell types in the head, including the mesoderm, endoderm, non-neural ectoderm, neural crest, notochord, and blood. With further analyses, this high-quality data promises to significantly advance our understanding of how these tissues develop in conjunction with the neural tissue, paving the way for future breakthroughs in developmental biology and genomics.

      Strengths:

      The data is well presented in the figures and thoroughly described in the text. The quality of the scRNAseq data and bioinformatic analysis is exceptional.

      Weaknesses:

      No weaknesses were identified by this reviewer.

    3. Reviewer #2 (Public review):

      Summary:

      Brooks et al. generate a gene expression atlas of the early embryonic cranial neural plate. They generate single-cell transcriptome data from early cranial neural plate cells at 6 consecutive stages between E7.5 to E9. Utilizing computational analysis they infer temporal gene expression dynamics and spatial gene expression patterns along the anterior-posterior and mediolateral axis of the neural plate. Subsequent comparison with known gene expression patterns revealed a good agreement with their inferred patterns, thus validating their approach. They then focus on Sonic Hedgehog (Shh) signalling, a key morphogen signal, whose activities partition the neural plate into distinct gene expression domains along the mediolateral axis. Single-cell transcriptome analysis of embryos in which the Shh pathway was pharmacologically activated throughout the neural plate revealed characteristic changes in gene expression along the mediolateral axis and the induction of distinct Shh-regulated gene expression programs in the developing fore-, mid-, and hindbrain.

      Strengths:

      This manuscript provides a comprehensive transcriptomic characterisation of the developing cranial neural plate, a part of the embryo that to my knowledge has not been extensively analysed by single-cell transcriptomic approaches. The single-cell sequencing data appears to be of high quality and will be a great resource for the wider scientific community. Moreover, the computational analysis is well executed and the validation of the sequencing data using published gene expression patterns is convincing. Taken together, this is a well-executed study that describes a relevant scientific resource for the wider scientific community.

      Weaknesses:

      Conceptually, the findings that gene expression patterns differ along the rostrocaudal, mediolateral, and temporal axes of the neural plate and that Shh signalling induces distinct target genes along the anterior-posterior axis of the nervous system are more expected than surprising. However, the strength of this manuscript is again the comprehensive characterization of the spatiotemporal gene expression patterns and how they change upon ectopic activation of the Shh pathway.

    4. Reviewer #3 (Public review):

      Summary:

      The authors performed a detailed single-cell analysis of the early embryonic cranial neural plate with unprecedented temporal resolution between embryonic days 7.5 and 8.75. They employed diffusion analysis to identify genes that correspond to different temporal and spatial locations within the embryo. Finally, they also examined the global response of cranial tissue to a Smoothened agonist.

      Strengths:

      Overall, this is an impressive resource, well-validated against sets of genes with known temporal and spatial patterns of expression. It will be of great value to investigators examining the early stages of neural plate patterning, neural progenitor diversity, and the roles of signaling molecules and gene regulatory networks controlling the regionalization and diversification of the neural plate.

      Weaknesses:

      The manuscript should be considered a resource. Experimental manipulation is limited to the analysis of neural plate cells that were cultured in vitro for 12 hours with SAG. Besides the identification of a significant set of previously unreported genes that are differentially expressed in the cranial neural plate, there is little new biological insight emerging from this study. Some additional analyses might help to highlight novel hypotheses arising from this remarkable resource.

    1. eLife Assessment

      This manuscript focuses on the identification of RNA crosslinks within the HIV RNA genome under different conditions i.e. in infected cells and in virions using a new method called HiCapR. These cross-links reveal long-range interactions that can be used to determine the structural arrangement of the viral RNA, providing useful data that show differences in the genomic organization in different conditions. The data analysis, however, is incomplete and based on extensive computational analysis from a limited number of datasets, which are in need of experimental validation.

    2. Reviewer #1 (Public review):

      This paper focuses on secondary structure and homodimers in the HIV genome. The authors introduce a new method called HiCapR which reveals secondary structure, homodimer, and long-range interactions in the HIV genome. The experimental design and data analysis are well-documented and statistically sound. However, the manuscript could be further improved in the following aspects.

      Major comments:

      (1) Please give the full name of an abbreviation the first time it appears in the paper, for example, in L37, "5' UTR" "RRE".

      (2) The introduction could be strengthened by discussing the limitations of existing methods for studying HIV RNA structures and interactions and highlighting the specific advantages of the HiCapR method.

      (3) Please reorganize Results Part 1.

      (4) Is there any reason that the authors mention "genome structure of SARS-CoV-2" in L95?

      (5) L102: Please clarify the purpose of comparing "NL4-3" and "GX2005002." Additionally, could you explain what NL4-3 and GX2005002 are? The connection between NL4-3, GX2005002, and HIV appears to be missing.

      (6) Figure 1A is not able to clearly present the innovation point of HiCapR.

      (7) Please compare the contact metrics detected by HiCapR and current techniques like SHAPE on the local interactions to assess the accuracy of HiCapR in capturing local RNA interactions relative to established methods.

      (8) The paper needs further language editing.

    3. Reviewer #2 (Public review):

      Summary:

      In the manuscript "Mapping HIV-1 RNA Structure, Homodimers, Long-Range Interactions and 1 persistent domains by HiCapR" Zhang et al report results from an omics-type approach to mapping RNA crosslinks within the HIV RNA genome under different conditions i.e. in infected cells and in virions. Reportedly, they used a previously published method which, in the present case, was improved for application to RNAs of low abundance.

      Their claims include the detection of numerous long-range interactions, some of which differ between cellular and virion RNA. Further claims concern the detection and analysis of homodimers.

      Strengths:

      (1) The method developed here works with extremely little viral RNA input and allows for the comparison of RNA from infected cells versus virions.

      (2) The findings, if validated properly, are certainly interesting to the community.

      Weaknesses:

      (1) On the communication level, the present version of the manuscript suffers from a number of shortcomings. I may be insufficiently familiar with habits in this community, but for RNA afficionados just a little bit outside of the viral-RNA-X-link community, the original method (reference 22) and the presumed improvement here are far too little explained, namely in something like three lines (98-100). This is not at all conducive to further reading.

      (2) Experimentally, the manuscript seems to be based on a single biological replicate, so there is strong concern about reproducibility.

      (3) The authors perform an extensive computational analysis from a limited number of datasets, which are in thorough need of experimental validation.

    1. eLife Assessment

      This study represents an important advance in our understanding of how certain inhibitors affect the behavior of voltage gated potassium channels. Robust molecular dynamics simulation and analysis methods lead to a new proposed inhibition mechanism with strength of support being mostly convincing, and incomplete in some aspects. This study has considerable significance for the fields of ion channel physiology and pharmacology and could aid in development of selective inhibitors for protein targets.

    2. Reviewer #1 (Public review):

      Summary:

      This study seeks to identify a molecular mechanism whereby the small molecule RY785 selectively inhibits Kv2.1 channels. Specifically, it sought to explain some of the functional differences that RY785 exhibits in experimental electrophysiology experiments as compared to other Kv inhibitors, namely the charged and non-specific inhibitor tetraethylammonium (TEA). This study used a recently published cryo-EM Kv2.1 channel structure in the open activated state and performed a series of multi-microsecond-long all-atom molecular dynamics simulations to study Kv2.1 channel conduction under the applied membrane voltage with and without RY785 or TEA present. While TEA directly blocks K+ permeation by occluding ion permeation pathway, RY785 binds to multiple non-polar residues near the hydrophobic gate of the channel driving it to a semi-closed non-conductive state. This mechanism was confirmed using an additional set of simulations and used to explain experimental electrophysiology data,

      Strengths:

      The total length of simulation time is impressive, totaling many tens of microseconds. The study develops forcefield parameters for the RY785 molecule based on extensive QM-based parameterization. The computed permeation rate of K+ ions through the channel observed under applied voltage conditions is in reasonable agreement with experimental estimates of the single-channel conductance. The study performed extensive simulations with the apo channel as well as both TEA and RY785. The simulations with TEA reasonably demonstrate that TEA directly blocks K+ permeation by binding in the center of the Kv2.1 channel cavity, preventing K+ ions from reaching the SCav site. The conclusion is that RY785 likely stabilizes a partially closed conformation of the Kv2.1 channel and thereby inhibits the K+ current. This conclusion is plausible given that RY785 makes stable contact with multiple hydrophobic residues in the S6 helix. This further provides a possible mechanism for the experimental observations that RY785 speeds up the deactivation kinetics of Kv2 channels from a previous experimental electrophysiology study.

      Weaknesses:

      The study, however, did not produce this semi-closed channel conformation and acknowledges that more direct simulation evidence would require extensive enhanced-sampling simulations. The study has not estimated the effect of RY785 binding on the protein-based hydrophobic pore constriction, which may further substantiate their proposed mechanism. And while the study quantified K+ permeation, it does not make any estimates of the ligand binding affinities or rates, which could have been potentially compared to the experiment and used to validate the models.

    3. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Zhang et al. investigate the conductivity and inhibition mechanisms of the Kv2.1 channel, focusing on the distinct effects of TEA and RY785 on Kv2 potassium channels. The study employs microsecond-scale molecular dynamics simulations to characterize K+ ion permeation and compound binding inhibition in the central pore.

      Strengths:

      The findings reveal a unique inhibition mechanism for RY785, which binds to the channel walls in the open structure while allowing reduced K+ flow. The study also proposes a long-range allosteric coupling between RY785 binding in the central pore and its effects on voltage-sensing domain dynamics. Overall, this well-organized paper presents a high-quality study with robust simulation and analysis methods, offering novel insights into voltage-gated ion channel inhibition that could prove valuable for future drug design efforts.

      Weaknesses:

      (1) The study neglects to consider the possibility of multiple binding sites for RY785, particularly given its impact on voltage sensors and gating currents. Specifically, there is potential for allosteric binding sites in the voltage-sensing domain (VSD), as some allosteric modulators with thiazole moieties are known to bind VSD domains in multiple voltage-gated sodium channels (Ahuja et al., 2015; Li et al., 2022; McCormack et al., 2013; Mulcahy et al., 2019).

      (2) The study describes RY785 as a selective inhibitor of Kv2 channels and characterizes its binding residues through MD simulations. However, it is not clear whether the identified RY785-binding residues are indeed unique to Kv2 channels.

      (3) The study does not clarify the details, rationale, and ramifications of a biasing potential to dihedral angles.

      (4) The observation that the Kv2.1 central pore remains partially permeable to K+ ions when RY785 is bound is intriguing, yet it was not revealed whether polar groups of RY785 always interact with K+ ions.

    1. eLife Assessment

      This is an important study demonstrating the importance of S100A4+ alveolar macrophages in the earlier stages of tumour development and suggesting a role in angiogenesis. As such this solid study is of interest to cancer biologists focused on early tumour development and those interested in the development of therapeutics that may specifically target early cancers.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors have leveraged Single-cell RNA sequencing of the various stages of the evolution of lung adenocarcinoma to identify the population of macrophages that contribute to tumor progression. They show that S100a4+ alveolar macrophages, active in fatty acid metabolic activity, such as palmitic acid metabolism, seem to drive the atypical adenomatous hyperplasia (AAH) stage. These macrophages also seem to induce angiogenesis promoting tumor growth. Similar types of macrophage infiltration were demonstrated in the progression of the human lung adenocarcinomas.

      Strengths:

      Identification of the metabolic pathways that promote angiogenesis-dependent progression of lung adenocarcinomas from early atypical changes to aggressive invasive phenotype could lead to the development of strategies to abort tumor progression.

      Weaknesses:

      (1) Can the authors demonstrate what are the functional specialization of the S100a4+ alveolar macrophages that promote the progression of the AAH to the more aggressive phenotype? What are the factors produced by these unique macrophages that induce tumor progression and invasiveness?

      (2) Angiogenic factors are not only produced by the S100a4+ cells but also by pericytes and potentially by the tumor cells themselves. Then, how do these factors aberrantly trigger tumor angiogenesis that drives tumor growth?

      (3) It is not clear how abnormal fatty acid uptake by the macrophages drives the progression of tumors.

      (4) Does infusion or introduction of S100a4+ polarized macrophages promote the progression of AAH to a more aggressive phenotype?

      (5) How does Anxa and Ramp1 induction in inflammatory cells induce angiogenesis and tumor progression?

      (6) For the in vitro studies the authors might consider using primary tumor cells and not cell lines.

    3. Reviewer #2 (Public review):

      Summary:

      The work aims to further understand the role of macrophages in lung precancer/lung cancer evolution

      Strengths:

      (1) The use of single-cell RNA seq to provide comprehensive characterisation.

      (2) Characterisation of cross-talk between macrophages and the lung precancerous cells.

      (3) Functional validation of the effects of S100a4+ cells on lung precancerous cells using in vitro assays.

      (4) Validation in human tissue samples of lung precancer / invasive lesions.

      Weaknesses:

      (1) The authors need to provide clarification of several points in the text.

      (2) The authors need to carefully assess their assumptions regarding the role of macrophages in angiogenesis in precancerous lesions.

      (3) The authors should discuss more broadly the current state of anti-macrophage therapies in the clinic.

    1. eLife Assessment

      This useful study partially succeeds in providing solid evidence in support of the therapeutic potential of the plant-derived compound eugenol for ameliorating symptoms associated with STZ-induced oxidative stress, identifying Nuclear factor E2-related factor (Nrf2) as a mediator of the effects induced by eugenol. Although the study provides interesting data, there remain concerns associated with the STZ model and the rather superficial mechanistic assessment.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors consider the effects of eugenol (EUG), a plant-produced substance known to reduce oxidative stress in various cellular contexts via Nrf2, in alleviating the effects of streptozotocin (STZ), a known rodent beta cell toxin. They claim that EUG treatment would be useful for T1D therapy.

      Strengths:

      The experiments shown are sufficiently clear and rather convincing in documenting that eugenol can revert the effects of streptozotocin on animal physiology as well as beta cell oxidative stress and cell death via activation of Nrf2.

      In the revised manuscript the authors corrected/explained most of the specific inconsistencies/mistakes pointed out.

      However, they did not address the opening paragraph that points out major concerns. I summarize them below, together with some that were dealt with in their response but still remain unaddressed or not commented upon.

      - STZ treatment cannot be used as a T1D model for the reasons I outlined in my previous letter. I would have been happy to see a response on that but they did not provide any. The manuscript is misleading in this important respect.

      - Mechanistically, the manuscript remains at a rather superficial level. I highlighted some possibilities to enrich the manuscript but none was addressed even in the discussion.<br /> (a) How is eugenol penetrating the cell, is there a receptor that could be potentially targeted?<br /> (b) Are there intermediary proteins that convey the effect to the Nrf2/Keap1 complex or is eugenol directly disrupting their interaction?<br /> (c) What are direct downstream Nrf2 effectors?<br /> (d) Besides, streptozotocin is also a powerful DNA alkylating agent, are such effects relieved by eugenol?

      - It is puzzling that all molecular analyses show a gradual reversion effect with increasing doses of eugenol but this gradual effect is apparently missing in many of the physiological parameters assessed in Figure 1, including the all-important OGTT assays. Can the authors interpret this? In the high eugenol group in the OGTT assays there is a group of mice that are clearly outliers. Most likely the STZ treatment for these mice was not efficient and their inclusion skews the results. Besides, it is important to assess differences among eugenol groups (one way ANOVA). The statistical tests provided are incomplete and sometimes not done correctly.

      - Given that medical research is still heavily biased in favor of analyses in males and given that the authors have analyzed in Figure 1 a very large number of animals what are the results stratified by sex?

    3. Reviewer #3 (Public review):

      Summary:

      This study by Jiang et al. aims to establish the streptozotocin (STZ)-induced type 1 diabetes mellitus (T1DM) mouse model in vivo and the STZ-induced pancreatic β cell MIN6 cell model in vitro to explore the protective effects of Eugenol (EUG) on T1DM. The authors tried to elucidate the potential mechanism by which EUG inhibits the NRF2-mediated anti-oxidative stress pathway. Overall, this study is well executed with solid data, offering an intriguing report from animal studies for a potential new treatment strategy for T1DM.

      Strengths:

      In vivo efficacy study is comprehensive and solid. Given STZ-induced T1DM is a devastating and harsh model, the in vivo efficacy from this compound is really impressive.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      Type 1 diabetes mellitus (T1DM) progression is accelerated by oxidative stress and apoptosis. Eugenol (EUG) is a natural compound previously documented as anti-inflammatory, anti-oxidative, and anti-apoptotic. In this manuscript by Jiang et al., the authors study the effects of EUG on T1DM in MIN6 insulinoma cells and a mouse model of chemically induced T1DM. The authors show that EUG increases nuclear factor E2-related factor 2 (Nrf2) levels. This results in a reduction of pancreatic beta-cell damage, apoptosis, oxidative stress markers, and a recovery of insulin secretion. The authors highlight these effects as indicative of the therapeutic potential of EUG in managing T1DM.

      Strengths

      Relevant, timely, and addresses an interesting question in the field. The authors consistently observe enhanced beta cell functionality following EUG treatment, which makes the compound a promising candidate for T1DM therapy.

      Weaknesses

      (1) The in vivo experiments have too few biological replicates. With an n=3 (as all figure legends indicate) in complex mouse studies such as these, drawing robust conclusions becomes challenging. It is important to reproduce these results in a larger cohort, to validate the conclusions of the authors.

      Thanks for your comments. In the figure legends of the first draft manuscript, n=3 means at least 3 biological replicates, and in the section of material and methods, n=30 means sample size. The number of mice in each group is 30 and there were 150 mice used in this study, and mice are assigned as follows for the whole in vivo experiments. The relative information has been added in the revised manuscript.

      Author response image 1.

      (2) Another big concern is the lack of quantifications and statistical analysis throughout the manuscript. Although the authors claim statistical significance in various experiments, the limited information provided makes it difficult to verify. The authors use vague and minimal descriptions of their experiments, which further reduces the reader's comprehension and the reproducibility of the experiments.

      Thanks for your constructive suggestion. We conducted quantitative and statistical analysis of the entire manuscript through GraphPad Prism software again. Additionally, we have improved the experimental description in the revised manuscript.

      (3) Finally, the use of Min6 cells as a model for pancreatic beta cells is a strong limitation of this study. Future studies should seek to reproduce these findings in a more translational model and use more relevant in vitro cell systems (eg. Islets).

      Thanks for your professional comments. Mouse insulinoma cells (MIN6 cell line) are permanent cell lines isolated from mouse islet β cell tumors, which can reflect the functional changes of islet β cells. As mature islet cells, MIN6 cells have been widely used in the study of type 1 diabetes mellitus[1-4], so in this study, MIN6 cells were used as the cell model in vitro. In our future studies, we will try to conduct our findings using more relevant in vitro cell systems (eg. Islets).

      References:

      (1) WU M, CHEN W, ZHANG S, et al. Rotenone protects against β-cell apoptosis and attenuates type 1 diabetes mellitus [J]. Apoptosis, 2019, 24(11-12): 879-91.

      (2) LUO C, HOU C, YANG D, et al. Urolithin C alleviates pancreatic β-cell dysfunction in type 1 diabetes by activating Nrf2 signaling [J]. Nutr Diabetes, 2023, 13(1): 24.

      (3) LAKHTER A J, PRATT R E, MOORE R E, et al. Beta cell extracellular vesicle miR-21-5p cargo is increased in response to inflammatory cytokines and serves as a biomarker of type 1 diabetes [J]. Diabetologia, 2018, 61(5): 1124-34.

      (4) LIN Y, SUN Z. Antiaging Gene Klotho Attenuates Pancreatic β-Cell Apoptosis in Type 1 Diabetes [J]. Diabetes, 2015, 64(12): 4298-311.

      Reviewer #3 (Public Review):

      Summary:

      This study by Jiang et al. aims to establish the streptozotocin (STZ)-induced type 1 diabetes mellitus (T1DM) mouse model in vivo and the STZ-induced pancreatic β cell MIN6 cell model in vitro to explore the protective effects of Eugenol (EUG) on T1DM. The authors tried to elucidate the potential mechanism by which EUG inhibits the NRF2-mediated anti-oxidative stress pathway. Overall, this study is well executed with solid data, offering an intriguing report from animal studies for a potential new treatment strategy for T1DM.

      Strengths:

      The in vivo efficacy study is comprehensive and solid. Given that STZ-induced T1DM is a devastating and harsh model, the in vivo efficacy of this compound is really impressive.

      Weaknesses:

      (1) The Mechanism is linked with the anti-oxidant property of the compound, which is common for many natural compounds, such as flavonoids and polyphenol. However, rarely, this kind of compound has been successfully developed into therapeutics in clinical usage. Indeed, if that is the case, Vitamin C or Vitamin E could be used here as the positive control.

      Thanks for your comments. In fact, many anti-oxidant drugs are used for the treatment of type 1 diabetes mellitus in the clinical. For example, lipoic acid was used to treat diabetic peripheral neuropathy[5]. Vitamin E could effectively eliminate free radicals, protect cell membranes, and significantly reduce the risk of cardiovascular disease in patients with SPACE or ICARE diabetes[6]. Glutathione played crucial roles in the detoxification and anti-oxidant systems of cells and has been used to treat acute poisoning and chronic liver diseases by intravenous injection[7]. Therefore, eugenol enhances the management of type 1 diabetes mellitus by modulating oxidative stress pathways and holds potential as a future therapeutic choice for clinical application. In the future relevant studies, we will try to use Vitamin C or Vitamin E as the positive control.

      References:

      (5) ZIEGLER D, PAPANAS N, SCHNELL O, et al. Current concepts in the management of diabetic polyneuropathy [J]. J Diabetes Investig, 2021, 12(4): 464-75.

      (6) VARDI M, LEVY N S, LEVY A P. Vitamin E in the prevention of cardiovascular disease: the importance of proper patient selection [J]. J Lipid Res, 2013, 54(9): 2307-14.

      (7) HONDA Y, KESSOKU T, SUMIDA Y, et al. Efficacy of glutathione for the treatment of nonalcoholic fatty liver disease: an open-label, single-arm, multicenter, pilot study [J]. BMC Gastroenterol, 2017, 17(1): 96.

      Reviewer #1 (Recommendations For The Authors):

      • For each of the figure panels the authors should indicate the exact number of biological replicates (how many mice or how many independent in vitro experiments). For IF panels, the number of mice, the number of histology slides per mouse, number of fields analyzed should be indicated.

      Thanks for your constructive suggestion. These details had been added in the revised manuscript.

      • The methods state n=30 and Figure 1 states n=3. N=3 is too little for such a complex in vivo study and would severely reduce the reliability of the in vivo experiments.

      Thanks for your suggestion. In the figure legends of the first draft manuscript, n=3 means at least 3 biological replicates, and in the section of material and methods, n=30 means sample size. The number of mice in each group is 30 and there were 150 mice used in this study, and mice are assigned as follows for the whole in vivo experiments. The in vivo experimental data of Figure 1 were supplemented in the revised manuscript.

      • Individual data points should be included in each of the graphs from this manuscript.

      Thanks for your reminder. The revised manuscript have shown the individual data points in each of the graphs.

      • The quantifications and statistics in the manuscript need improvement. Several experiments are missing quantifications and/or statistical tests (e.g. Figure 1J). Other experiments show a quantification but without any explanation of replicates (e.g. Figures 2B and 2G). None of the experiments show individual data points, and as in the previous comment, these should be included.

      Thanks for your comments. In the revised manuscript, statistics and repetitions of experimental data have been supplemented, and individual data points were shown in each graph.

      • What is the reason for intragastric administration? The previous studies on which the dosages were based used oral administration (gavage). (Discussed in methods 4.2).

      Thanks for your professional comments. The intervention treatment of T1DM mice is conducted through two methods: oral administration[8] and oral gavage[9-11]. Due to limited experimental conditions, it is not feasible to feed a single mouse in a single cage, which makes it challenging to precisely control the actual daily intervention dose for each mouse when using oral administration. To ensure that each mouse receives an intervention dose according to its weight and expected dosage, we employ a method of gavage. In addition, oral gavage is more convenient and easier to operate than oral administration. Therefore, in vivo experiment of this study used eugenol gavage intervention as a treatment method. These details had been added in the revised manuscript.

      References:

      (8) ZHAO H, WU H, DUAN M, et al. Cinnamaldehyde Improves Metabolic Functions in Streptozotocin-Induced Diabetic Mice by Regulating Gut Microbiota [J]. Drug Des Devel Ther, 2021, 15: 2339-55.

      (9) XING D, ZHOU Q, WANG Y, et al. Effects of Tauroursodeoxycholic Acid and 4-Phenylbutyric Acid on Selenium Distribution in Mice Model with Type 1 Diabetes [J]. Biol Trace Elem Res, 2023, 201(3): 1205-13.

      (10) SUDIRMAN S, LAI C S, YAN Y L, et al. Histological evidence of chitosan-encapsulated curcumin suppresses heart and kidney damages on streptozotocin-induced type-1 diabetes in mice model [J]. Sci Rep, 2019, 9(1): 15233.

      (11) YAO H, SHI H, JIANG C, et al. L-Fucose promotes enteric nervous system regeneration in type 1 diabetic mice by inhibiting SMAD2 signaling pathway in enteric neural precursor cells [J]. Cell Commun Signal, 2023, 21(1): 273.

      • Urine volume cannot be specified per mouse (methods 4.4) unless the mice were single-housed or if the different groups were not mixed, both are not ideal study set-ups. Please clarify in the methods section.

      Thanks for your constructive suggestion. After successful modeling of T1DM mice, the successful modeling mice were grouped based on method 4.2 as follows Control, T1DM, T1DM + EUG (5 mg/kg/day), T1DM + EUG (10 mg/kg/day), and T1DM + EUG (20 mg/kg/day). To ensure consistency among groups, each group consisted of 5 mice and had equal amounts of diet (100 g), drinking water (250 mL), and environmental conditions for feeding. The urine-soaked area of mice in each group was recorded to quantify the urine volume. The conditions are the same for each group. The description of Method 4.4 has been improved in the revised manuscript.

      • OGTT (Figure 1H) of week 2 is missing. This is an important control time point, as it would show the effect of STZ before EUG treatment.

      Thanks for your careful review. OGTT (Figure 1H) of week 2 has been added in the revised manuscript.

      • In Figure 1J, the control group does not follow the expected ITT trajectory. If possible, add the 120-minute time point to see if the blood glucose levels return to baseline in the control group. The graph shows increased basal glucose levels in the experimental groups, but no differences in insulin tolerance. It also misses the AUC calculations. It is probably not significantly different, which should be noted in the text.

      Thanks for your suggestion. T1DM primarily manifests as pancreatic β cell damage and the absolute reduction of insulin secretion, resulting in the disorder of glucose metabolism in vivo. The oral glucose tolerance test (OGTT) is a series of plasma glucose concentrations measured within 2 h after oral gavage of a certain amount of glucose. It is a standard method to evaluate an individual's blood glucose regulation ability and to understand the function of islet β cells. Insulin resistance means reducing the efficiency of insulin to promote glucose uptake and utilization for various reasons, and the body's compensatory secretion of excessive insulin leads to hyperinsulinemia to maintain the stability of blood glucose. The insulin resistance test (ITT) is commonly employed to detect insulin resistance in T2DM. However, it was found that the ITT experiment had little correlation with T1DM. Therefore, the ITT experiment of Figure 1J and related description have been removed from the revised manuscript.

      • The staining and FACS data on the effects of STZ+EUG+/- ML385 are not convincing (Figure 6 and Figure 7) and do not seem to align with the bar graphs and the conclusions in the text. It would be good to include immunofluorescent staining for insulin to further validate the effects of STZ+EUG+/- ML385 on insulin expression.

      Thanks for your comments.

      (1) In the revised manuscript, between the statistical results and the pictures, so we re-conducted the statistics of the immunofluorescence results of NRF2 and HO-1, as follows:

      (1) NRF2 immunofluorescence staining:

      Author response image 2.

      Group 1

      Author response image 3.

      Group 2

      Author response image 4.

      Group 3

      Author response image 5.

      Group 4

      Author response image 6.

      Group 5

      Author response image 7.

      NRF2 immunofluorescence staining statistics:

      (2) HO-1 immunofluorescence staining:

      Author response image 8.

      Group 1

      Author response image 9.

      Group 2

      Author response image 10.

      Group 3

      Author response image 11.

      Group 4

      Author response image 12.

      Group 5

      Author response image 13.

      HO-1 immunofluorescence staining statistics:

      (2) The meanings represented by each quadrant of cell flow analysis are as follows: Q1 represents a group of necrotic cells, characterized by positive PI staining and negative Anenexin V staining; Q2 represents late apoptotic cells, with both PI and Anenexin V staining negative; Q3 represents early apoptotic cells, with both PI and Anenexin V staining positive; Q4 represents living cells, characterized by positive Anenexin V staining and negative PI staining. In the experiment, the number of apoptotic cells were calculated as the sum of late apoptotic cells in Q2 and early apoptotic cells in Q3. As shown in Figure 9F-G, these results were consistent with those observed in Figure 6G, 6J and Figure 7D-F.

      (3) MIN6 cells, as mouse islet β cell line, has the function of secreting insulin. The intervention of STZ was an absolute decrease in the number of islet β cells, so the result of insulin immunofluorescence staining was only a decrease in the number of MIN6 cells in each cell group. In addition, the detection of insulin protein expression level is always through ELISA method to assess the secretion of insulin protein in the cell supernatant. Figure 6E is the ELISA results of insulin protein secretion in the cell supernatant.

      • The experimental design for the in vitro experiments was unclear from the text. Consider including a schematic to show when cells were treated with STZ, EUG, and ML385.

      Thanks for your suggestion. The experimental design for the in vitro experiments of this study has been added in Figure 6A of the revised manuscript.

      • As stated in the Discussion, the use of the insulinoma line Min6 as a model instead of primary pancreatic beta cells is a clear limitation of the study. The mechanistic data would be stronger if validated on a more relevant system (eg. untransformed Islets).

      Thanks for your comments. Mouse insulinoma cells (MIN6 cell line) are permanent cell lines isolated from mouse islet β cell tumors, which can reflect the functional changes of islet β cells. As mature islet cells, MIN6 cells have been widely utilized as an in vitro cellular model for diabetes research to investigate the functionality of β cells within pancreatic islets[1, 2, 12]. So in this study, MIN6 cells were used as the cell model in vitro. In our future studies, we will try to conduct our findings using more relevant in vitro cell systems (eg. Islets).

      References:

      (1) WU M, CHEN W, ZHANG S, et al. Rotenone protects against β-cell apoptosis and attenuates type 1 diabetes mellitus [J]. Apoptosis, 2019, 24(11-12): 879-91.

      (2) LUO C, HOU C, YANG D, et al. Urolithin C alleviates pancreatic β-cell dysfunction in type 1 diabetes by activating Nrf2 signaling [J]. Nutr Diabetes, 2023, 13(1): 24.

      (12) CHEN H, LOU Y, LIN S, et al. Formononetin, a bioactive isoflavonoid constituent from Astragalus membranaceus (Fisch.) Bunge, ameliorates type 1 diabetes mellitus via activation of Keap1/Nrf2 signaling pathway: An integrated study supported by network pharmacology and experimental validation [J]. J Ethnopharmacol, 2024, 322: 117576.

      • The use of small molecule inhibitors such as ML385 can have unspecific effects. Genetic manipulation or the use of siRNAs to inhibit the NRF2 pathway would have been preferable for the in vitro experiments.

      Thanks for your constructive suggestion. ML385 is a commonly used and stable inhibitor of the NRF2 and has been used in a variety of disease studies[13-15]. The MIN6 cells utilized in this study were cultured under challenging conditions and exhibited a sluggish growth rate. Owing to the cytotoxicity associated with siRNAs transfection reagents, a significant proportion of MIN6 cells succumbed following transfection. Consequently, small molecule inhibitors ML385 were employed in this investigation. In our future studies, we will try to conduct our findings using siRNAs.

      References:

      (13) DANG R, WANG M, LI X, et al. Edaravone ameliorates depressive and anxiety-like behaviors via Sirt1/Nrf2/HO-1/Gpx4 pathway [J]. J Neuroinflammation, 2022, 19(1): 41.

      (14) WANG Z, YAO M, JIANG L, et al. Dexmedetomidine attenuates myocardial ischemia/reperfusion-induced ferroptosis via AMPK/GSK-3β/Nrf2 axis [J]. Biomed Pharmacother, 2022, 154: 113572.

      (15) LI J, DENG S H, LI J, et al. Obacunone alleviates ferroptosis during lipopolysaccharide-induced acute lung injury by upregulating Nrf2-dependent antioxidant responses [J]. Cell Mol Biol Lett, 2022, 27(1): 29.

      • The study proposes a mechanism in which EUG-induced disruption of KEAP1 and NRF2 interaction leads to NRF2 translocation to the nucleus and upregulation of proteins required to prevent oxidative stress. In Figure 6H it is unclear whether the nuclear NRF2 increases. Please add quantifications of the immunostainings.

      Thanks for your reminder. Figure 6J shows the quantifications of the immunostainings of NRF2 in the revised manuscript.

      • Some of the figure legends lack important information. In Figure 5A, 6E for instance, what is the protein expression normalized to?

      Thanks for your constructive suggestion. Protein normalization refers to the standardization of proteins from different sources and with different properties, so as to facilitate the comparison of protein content and expression in different samples. In WB experiment, protein expression normalization is one of the essential steps. Western blot of nuclear protein generally cannot be performed using β-Actin as an internal reference. Lamin B was chosen because β-Actin is an intrinsic parameter not found in the nucleus. N-NRF2, as a nuclear protein, requires Lamin B as a reference for protein normalization. The lack important information of WB in Figure have been supplemented in figure legends of the revised manuscript.

      • Please acknowledge previous literature on the effects of EUG/clove oil in diabetes models. The meta-analytical review by Carvalho et al. (DOI: 10.1016/j.phrs.2020.105315) should be cited and discussed.

      Thanks for your suggestion. It has been cited and discussed in the revised manuscripts.

      • Consider revising the text for grammar, language mistakes, and readability. The text is not always precise (e.g. in the explanation of gamma-H2AX in the results), does not explain terminology (e.g. the oxidative stress markers - line 204+205), or simplifies conclusions (e.g. "improved islet function" based on glucose tolerance test", line 129).

      Thanks for your comments. The above problem has been solved in the revised manuscripts. In addition, we had send our manuscript to the professional English language editing company to improve our paper, and the editorial certificate had been submitted as a supplement document.

      • In the current format, some figures are out of focus. Please make sure to upload a high-quality version for publication.

      Thanks for your suggestion. A high quality version figures has been uploaded. Perhaps due to the excessive content of the file after upload, the file is compressed, and the figures is not focused. So, all figures in this study have been uploaded separately for download in the review system.

      Reviewer #2 (Recommendations For The Authors):

      Below are specific points of criticism on the experiments presented.

      (1a) There is no comparison among eugenol treatments with regards to fasting weight, blood glucose, water intake, food intake, and, crucially, OGTT. All three treatments appear to show very similar effects but has this been statistically assessed? Shown statistical significance of ketonuria between no and high eugenol treatments seems exaggerated.

      Thanks for your comments. EUG intervention has a dose-dependent effect on T1DM. According to Figure 1B-I, 20 mg/kg EUG has the best effect. Fasting body weight, blood glucose, water intake, food intake, and OGTT were statistically assessed in Figure 1 of the revised manuscript. In addition, we performed statistical analyse of ketonuria between no and high eugenol treatments again in the revised manuscript. In the revised manuscript, we have also made objective revisions to the expression of eugenol's efficacy.

      (b) ITT is not used to detect T1DM (line 126).

      Thanks for your suggestion. T1DM primarily manifests as pancreatic β cell damage and the absolute reduction of insulin secretion, resulting in the disorder of glucose metabolism in vivo. The oral glucose tolerance test (OGTT) is a series of plasma glucose concentrations measured within 2 h after oral gavage of a certain amount of glucose. It is a standard method to evaluate an individual's blood glucose regulation ability and to understand the function of islet β cells. Insulin resistance means reducing the efficiency of insulin to promote glucose uptake and utilization for various reasons, and the body's compensatory secretion of excessive insulin leads to hyperinsulinemia to maintain the stability of blood glucose. The insulin resistance test (ITT) is commonly employed to detect insulin resistance in T2DM. However, it was found that the ITT experiment had little correlation with T1DM. Therefore, the ITT experiment and related description have been removed in the revised manuscript.

      (2) Here it is hard to reconcile the gradual increase of Ins protein levels in (STZ) and (STZ + increasing eugenol) samples with(a) results in 1 suggesting that the dose of eugenol does not significantly affect the outcome and(b) Ins expression, which is essentially undetectable in both STZ and STZ+EUG mice. A likely explanation is that EUG just postpones beta cell death. I assume that these analyses were done in week 10 but it is not stated.

      Thanks for your professional suggestion. Perhaps because the file is compressed, the gray value of WB strip is not obvious, so the expression of INS is not seen clearly. In fact, the intervention of STZ resulted in a significant decrease in INS expression compared with the Control group, which could be alleviated by the treatment of EUG. However, due to the large difference in INS between the STZ group, EUG treatment, and the Control group, the gray values of INS in the STZ group and the STZ + EUG group were not clear. As mentioned in the method 4.12-4.13, our WB and PCR samples were from 10 week mice.

      (3) The γH2Ax stainings provided are weak and do not fully correspond to the quantitation - the 5 mg/Kg EUG treatment appears less severe than the 10 mg/Kg. In contrast, changes in the PCD pathway are convincingly demonstrated.

      Thanks for your reminder. γH2AX immunohistochemical staining is required to be located in the islets. It measured the number of β cells stained with brown, not the brown area. The ZOOM image of γH2AX staining showed that the EUG improvement effect of 10 mg/kg was better than that of 5 mg/kg. γH2AX, as a marker of DNA damage, exhibits nuclear localization and is absent in the cytoplasmic compartment. Therefore, in Figure 4C-D, we quantified the proportion of cells exhibiting brown staining. In Figure 4C, black arrows were employed to highlight the presence of brown-stained islet β cells.

      (4) Is there a reason for looking at mRNA levels of Ho-1 but not KEAP1 or NQO-1 ? What is the expression of Nrf2 itself at the RNA level? Please give in the text what the abbreviations MDA, SOD, CAT GSH-Px stand for. Are these protein levels or activity assays? Units in the y-axis of graphs?

      Thanks for your constructive suggestion.The required KEAP1 and NQO-1 primers have been synthesized, and the relevant data have been supplemented in the revised manuscript. The expression of Nrf2 itself at the RNA level is T-NRF2 (Total NRF2). The MDA, SOD, CAT and GSH-Px abbreviations stand for Malondialdehyde, Superoxide dismutase, Catalase, Glutathione peroxidase, and the relevant information, which have been supplemented in the revised manuscript. These are activity assays of serum, and units in the y-axis of graphs have been added in the revised manuscripts.

      (5) The Ins levels in the culture medium of STZ + ML treated cells are much lower than the levels in STZ treated cells (6D). This is not consistent with the results of Ins cell content or Ins expression as stated (6B and D).

      Thanks for your careful review. The experimental samples in Figure 6C in the revised manuscript represent the proteins extracted from cells of each group, while the experimental samples in Figure 6E represent the supernatant of cells from each group. ML385 is an inhibitor of NRF2, which effectively suppresses the NRF2 signaling pathway and aggravates MIN6 cell damage, resulting in lower INS expression observed in both the STZ+ML385 group depicted in Figures 6C and 6E compared to that in the STZ group. Although the sample sources of the two groups differ and there are slight variations in the trend, it can be observed that the overall trend of the STZ+ML385 group is comparatively lower than that of the STZ group.

    1. eLife Assessment

      This work is important because it elucidates how immune cells migrate across the blood brain barrier. In the revised version of this study, the authors present a convincing framework to visualize, recognize and track the movement of different immune cells across primary human and mouse brain microvascular endothelial cells without the need for fluorescence-based imaging using microfluidic devices. This work will be of interest to the cancer biology, immunology and medical therapeutics fields.

    2. Reviewer #1 (Public review):

      Summary:

      It is evident that studying leukocyte extravasation in vitro is a challenge. One needs to include physiological flow, culture cells and isolate primary immune cells. Timing is of utmost importance and a reproducible setup essential. Extra challenges are met when extravasation kinetics in different vascular beds is required, e.g., across the blood-brain barrier. In this study, the authors describe a reliable and reproducible method to analyze leukocyte TEM under physiological flow conditions, including this analysis. That the software can also detect reverse TEM is a plus.

      Strengths:

      It is quite a challenge to get this assay reproducible and stable, in particular as there is flow included. Also for the analysis, there is currently no clear software analysis program, and many labs have their own methods. This paper gives the opportunity to unify the data and results obtained with this assay under label-free conditions. This should eventually lead to more solid and reproducible results.

      Also, the comparison between manual and software analysis is appreciated.

      Weaknesses:

      The authors stress that it can be done in BBB models, but I would argue that it is much more broadly applicable. This is not necessarily a weakness of the study but more an opportunity to strengthen the method. So I would encourage the authors to rewrite some parts and make it more broadly applicable.

    3. Reviewer #2 (Public review):

      Summary:

      This paper develops an under-flow migration tracker to evaluate all the steps of the extravasation cascade of immune cells across the BBB. The algorithm is useful and has important applications.

      Strengths:

      The algorithm is almost as accurate as manual tracking and importantly saves time for researchers. The authors have discussed how their tool compares to other tracking methods.

      Weaknesses:

      Applicability can be questioned because the device used is 2D and physiological biology is in 3D. However, the authors have addressed this point in their revised manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      It is evident that studying leukocyte extravasation in vitro is a challenge. One needs to include physiological flow, culture cells and isolate primary immune cells. Timing is of utmost Importance and a reproducible setup essential. Extra challenges are met when extravasation kinetics in different vascular beds is required, e.g., across the blood-brain barrier. In this study, the authors describe a reliable and reproducible method to analyze leukocyte TEM under physiological flow conditions, including this analysis. That the software can also detect reverse TEM is a plus.

      Strengths:

      It is quite a challenge to get this assay reproducible and stable, in particular as there is flow included. Also for the analysis, there is currently no clear software analysis program, and many labs have their own methods. This paper gives the opportunity to unify the data and results obtained with this assay under label-free conditions. This should eventually lead to more solid and reproducible results.

      Also, the comparison between manual and software analysis is appreciated.

      We thank the Reviewer for their positive evaluation of our manuscript and highlighting the value of obtaining more reproducible and unbiases results, as well as detection of forward and reverse transmigration with UFMTrack.

      Weaknesses:

      The authors stress that it can be done in BBB models, but I would argue that it is much more broadly applicable. This is not necessarily a weakness of the study but more an opportunity to strengthen the method. So I would encourage the authors to rewrite some parts and make it more broadly applicable.

      We thank the Reviewer for this suggestion. In the revised version of our manuscript, we have now emphasized the broader applicability of UFMTrack to analyze the interaction of immune cells with 2dimensional endothelial monolayers in various contexts in the abstract, introduction, and discussion sections.

      Reviewer #2 (Public Review):

      Summary:

      This paper develops an under-flow migration tracker to evaluate all the steps of the extravasation cascade of immune cells across the BBB. The algorithm is useful and has important applications.

      Strengths:

      Algorithm is almost as accurate as manual tracking and importantly saves time for researchers.

      We thank the Reviewer for this positive evaluation of our work.

      Weaknesses:

      Applicability can be questioned because the device used is 2D and physiological biology is in 3D. Comparisons to other automated tools was not performed by the authors.

      We thank the Reviewer for pointing our attention to these weaknesses in our manuscript.

      We have clarified in the revised manuscript that using 2D endothelial monolayer models in parallel laminar flow chambers is still a state-of-the-art methodology for studying the multi-step extravasation process of immune cells across endothelial monolayers under physiological flow by in vitro live cell imaging. These models provide excellent optical quality that is not yet achieved in 3D models. We have extended the introduction to emphasize the limitations of existing tools that motivated us to establish UFMTrack. We have furthermore extended the discussion section to highlight the features unique to our UFMTrack framework.

      Reviewer #3 (Public Review):

      Summary:

      The authors aimed to establish a faster and more efficient method of tracking steps of T-cell extravasation across the blood brain barrier. The authors developed a framework to visualize, recognize and track the movement of different immune cells across primary human and mouse brain microvascular endothelial cells without the need for fluorescence-based imaging. The authors succinctly describe the basic requirements for tracking in the introduction followed by an in-depth account of the execution.

      We thank the Reviewer for their positive evaluation of our manuscript and highlighting the value of label-free analysis of the multistep immune cell extravasation cascade with UFMTrack.

      Weaknesses and Strengths:

      Materials & methods and results:

      (1) The methods section also lacks details of the microfluidic device that the authors talk about in the paper. Under physiological sheer stress, the T-cells detach from the pMBMEC monolayer, and are hence unable to be detected; however, this observation requires an explanation pertaining to the reason of occurrence and potential solutions to circumvent it to ensure physiologically relevant experimental parameters.

      We thank the Reviewer for pointing out this oversight. We have used a custom-made microfluidic device that has been published and described in detail before. This information has now been included in the Methods Section under Point 7, and the two references describing the flow chamber in depth are mentioned below and have been included in the manuscript.  

      Coisne Caroline, Ruth Lyck and Britta Engelhardt. 2013. Live cell imaging techniques to study T cell trafficking across the blood-brain barrier in vitro and in vivo. Fluids and Barriers of the CNS 10:7 doi:10.1186/20458118-10-7; 21 January 2013

      Lyck R, Hideaki Nishihara, Sidar Aydin, Sasha Soldati and Britta Engelhardt. 2022. Modeling brain vasculature immune interactions in vitro. Angogenesis, 2nd edition. Editors PatriciaD’Amore and Diane Bielenberg Cold Spring Harb Perspect Med doi: 10.1101/cshperspect.a041185

      T cell detachment is a physiologically relevant parameter besides T cell arrest, polarization, crawling, probing, and transmigration during the interaction with an endothelial monolayer. T cell detachment means that post-arrest, the T cell cannot engage adhesion molecules required for subsequent polarization and, eventually, transmigration. 

      (2) The author describes a method for debris exclusion using UFMTrack that eliminates objects of <30 pixels in size from analysis based on a mean pixel size of 400 for T lymphocytes. However, this mean pixel size appears to stem from in-vitro activated CD8 T cells, which rapidly grow and proliferate upon stimulation. In line with this, activated lymphocytes exhibit increased cytoplasmic area, making them appear less dense or “brighter” by phase microscopy compared to naïve lymphocytes, which are relatively compact and subsequently appear dimmer. Given this, it is not clear whether UFMTrack is sufficiently trained to identify naïve human lymphocytes in circulating blood, nor smaller, murine lymphocytes. Analysis of each lymphocyte subtype in terms of pixel size and intensity would be beneficial to strengthen the claim that UFMTrack can identify each of these populations. Additionally, demonstrating that UFMTrack can correctly characterize the behavior of naïve versus activated lymphocytes isolated from murine and human sources would strengthen the claim that UFMTrack can be broadly applied to study lymphocyte dynamics in diverse models without additional training

      We thank the Reviewer for the suggestion to more precisely evaluate the range of cell sizes that can be analyzed by our framework. We have included a visualization of crawling cell sizes successfully analyzed by the UFMTrack in Supplementary Figure 7. It demonstrates that the human peripheral blood mononuclear cells, that are almost twice as small as the activated mouse CD4 T cells used in these assays, can be successfully segmented, tracked, and analyzed with the UFMTrack framework. Thus, our UFMTrack framework is suitable for a broad application to differentially sized immune cells during their interaction with the endothelial cell monolayer under flow. 

      (3) Average precision was compared to the analysis of UFMTrack but it is unclear how average precision was calculated. This information should have been included in the methods section

      We thank the Reviewer for pointing our attention to the missing information. We have added a subsection, “Performance Analysis”, to the Materials and Methods section, where we describe the statistical methods and the performance metrics used to evaluate the UFMTrack framework.

      (4) CD4 and CD8 T cells exhibit distinct biology and interaction kinetics driven in part by their MHC molecule affinity and distinct receptor expression profiles. Thus, it is unclear why two distinct mechanisms of endothelial cell activation are needed to see differences between the populations.

      We thank the Reviewer for pointing out that different cytokine stimulations of endothelial cells were used in the assays used here to test our UFMTrack to analyze CD4 and CD8 T cell interactions with the endothelial monolayer. While the Reviewer is correct that CD4 and CD8 T cells use different mechanism to cross the pMBMEC monolayer as show by us (doi: 10.1002/eji.201546251.) and others and that recognition of cognate antigen on MHC class I on pMBMECs will arrest CD8 T cells and lead to CD8 T-cell mediated apoptosis ( doi: 10.1038/s41467-023-38703-2.) the focus of the present study was not on comparing CD4 and CD8 T cell interactions with the pMBMEC monolayer but rather to test suitability of UFMTrack to study the different multi-step transmigration of these T cell subsets across the endothelial monolayer. 

      (5) The BMECs are barrier tissues but were cultured on µdishes in this study. To study the transmigration of T-cells across the endothelium, the model would have been more relevant on a semi-permeable membrane instead of a closed surface.

      We understand the critique of the Reviewer, but laminar flow chambers with endothelial monolayers still provide a state-of-the-art and established methodology to study immune cell migration across endothelial monolayers by in vitro live cell imaging including endothelial cells forming the blood-brain barrier.  

      (6) Methods are provided for the isolation and expansion of human effector and memory CD4+ T cells. However, there is no mention of specific CD4+ T cell populations used for analysis with UFMTrack, nor a clear breakdown of tracking efficiency for each subpopulation. Further, there is no similar method for the isolation of CD8+ T cell compartments. A clear breakdown of the performance efficiency of UFMTrack with each cell population investigated in this study would provide greater insight into the software’s performance with regard to tracking the behavior and movement of distinct immune populations.

      We thank the Reviewer for this comment. Since a fair performance evaluation requires collecting reliable and consistent manual annotations, in this work we have performed such analysis only for the mouse CD8 T-cell population migrating on the pMBMEC monolayer. We have chosen this as a reference since it is a different cell population than the one the segmentation model was trained on. This provides an insight into how high performance is expected when other immune cell types are studied than the ones used for model development.

      (7) The results section is quite extensive and discusses details of establishment of the framework while highlighting both the pros and cons of the different aspects of the process, for example the limitation of the two models, 2D and 2D+T were highlighted well. However, the results section includes details which may be more fitting in the methods section.

      We thank the Reviewer for highlighting the extensive work carried out in the development of our UFMTrack framework. We decided to include in the results section only the description of key elements and design decisions taken when developing the framework, such as the need to include a time series of images for successful segmentation of the transmigrated cells. At the same time, the majority of implementational details can be found in the Supplementary Material.

      (8) A few statements in the results section lacked literary support, which was not provided in the discussion either, such as support for increased variance of T-cell instantaneous speed on stimulated vs non-stimulated pMBMECs. Another example is the enhancement of cytokine stimulation directed T-cell movement on the pMBMECs that the authors observed but failed to relay the physiological relevance of it. The authors don’t provide enough references for developments in the field prior to their work which form the basis and need for this technology.

      We thank the Reviewer for this comment and for asking for literature references. However, we cannot provide such references as these are original observations we made by employing the UFMTrack framework.  This shows that UFMTrack observes T-cell behaviors that have previously been overlooked. Their physiological relevance will have to be explored in separate studies. We have extended the introduction section to include the details on the existing methods developed in the field, as well as their weaknesses that motivated the development of the UFMTrack framework.

      (9) The rationale for use of OT-1 and 2D2-derived murine lymphocytes is unclear here. The OT-1 model has been generated to study antigen-specific CD8+ T cell responses, while the 2D2 model has been generated to recapitulate CD4 T cell-specific myelin oligodendrocyte glycoprotein (MOG) responses.

      To establish and test the UFMTrack framework, we have made use of the specific T-cell subsets and endothelial cell models we generally use within our research context. Especially for animal work, this is according to the 3R rules requesting to reduce animal experimentation.  

      Figures and text:

      (1) There are certain discrepancies and misarrangement of figures and text. For example, discussion of the effect of sheer flow on T cell attachment as part of the introduction in figure 1 and then mentioning it in the text again in the results section as part of figure 4 is repetitive.

      We thank the Reviewer for pointing our attention to this misarrangement. We have adjusted the label of Figure 4 to emphasize that this effect is correctly captured by the UFMTrack.

      (2) Section IV, subsection 1 of the results section, refers to ‘data acquisition section above’ in line 279, however the said section is part of materials and methods which is provided towards the end of the manuscript.

      We thank the Reviewer for pointing our attention to this misarrangement. We have adjusted the text to reflect the correct chapter order.

      (3) There are figures in the manuscript that have not been referenced in the results section, for example, figure 3A and B. Figure 1 hasn’t been addressed until subsection 7 of materials and methods

      We thank the Reviewer for pointing our attention to this misarrangement. We have adjusted the text to refer to all figure panels and the clarification of the cell multiplicity estimation in the supplementary information section. References to Figure 1 were added in the introduction section to illustrate the in vitro under flow imaging setup as well as the typical T cell behaviors in such experiments.

      (4) A lack of significance but an observed trend of increased variance of T cell instantaneous speed is reported in line 296-298; however, the graph (figure 4G) shows a significant change in instantaneous speed between non-stimulated and TNFα-stimulated systems. This is misleading to the readers.

      We thank the Reviewer for pointing our attention to this discrepancy. We have expanded the text to indicate a low statistical significance for the TNF and no significance but just a trend for the IL1-beta conditions.

      (5) The authors talk about three beginner experimentors testing the manual T cell tracking process but figure 5 only showcases data from two experimentors without stating the reason for excluding experimentor 1.

      We thank the Reviewer for pointing our attention to this ambiguity. While both the migration analysis and the manual cell tracking were performed by all three beginner experimenters, the cell tracking data for the first one was unfortunately lost due to a hardware failure.

      Discussion:

      (1) While the discussion captures the major takeaways from the paper, it lacks relevant supporting references to relate the observation to physiological conditions and applicability.

      This study is not about the physiological relevance of the microfluidic devices and immune cells used but rather about advancing methodology to analyze dynamic immune cell behavior on endothelial monolayers under physiological flow. Therefore, the discussion does not extend to comparing the physiological relevance of the specific in vitro models employed in this study.   

      (2) The discussion lacks connection to the results since the figures were not referenced while discussing an observed trend

      We thank the Reviewer for pointing our attention to this misarrangement. We have included the references to the relevant figures as well as supporting references.

      (3) The authors briefly looked into mouse and human BMECs and their individual interaction with Tcells, but don’t discuss the differences between the two, if any, that challenged their framework.

      We thank the Reviewer for pointing our attention to this weakness. We have added to the discussion section clarifications on the challenges of analyzing the T cell interactions with the HBMEC and the BMDM interactions with the pMBMEC monolayer.

      (4) Even though though the imaging tool relies on difference in appearance for detection, the authors talk about lack of feasibility in detecting transmigration of BMDMs due to their significantly different appearance. The statement lacks a problem solving approach to discuss how and why this was the case.

      We thank the Reviewer for pointing our attention to this weakness and apologize for the misleading explanation of the problem of analyzing the BMDM sample. Since the transmigrated part of the macrophages differs in appearance from a transmigrated part of a T cell, its detection by a Deep Neural Network trained on the T cell data is worse than that for the T cells. At the same time, the detection performance before the transmigration is sufficient for the BMDM migration analysis. The potential approaches to alleviate this are added to the discussion section.

      Relevance to the field:

      Utilizing the framework provided by the authors, the application can be adapted and/or utilized for visualizing a range of different cell types, provided they are different in appearance. However, this would require extensive changes to the script and won’t be adaptable in its current form.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors should announce in the abstract that the software analysis Track is downloadable and free to use for all researchers. They may consider providing some sort of helpdesk, although I realize that that may run into too much time.

      As said above, they stress that it can be done in BBB models, but I would argue that it is much more broadly applicable.

      We thank the Reviewer for these suggestions. We have emphasized the broader applicability of UFMTrack in the abstract and pointed out the public availability of the code and data.

      Can they add an experiment that shows that it also works for neutrophils for example? I understand that on paper yes it should work, but the neutrophils are of course faster etc.

      This is an excellent suggestion, but we tested UFMTrack within the current framework of ongoing research, which does not include the investigation of neutrophil transmigration across endothelial monolayers.  

      Also, the combination of different leukocytes in one TEM assay would really be a step forward. If the software can detect different-sized leukocytes, then this should be possible.

      We thank the Reviewer for this suggestion. We have added Supplementary Figure 7, demonstrating the range of cell sizes that were successfully analyzed by the UFMTrack framework throughout our manuscript. We also added a statement to the discussion that according to this data, “simply by discriminating cells by size, it is possible to extend UFMTrack to study the interaction of several types of immune cells migrating on top of a cellular monolayer under flow.”

      Extra challenges: can the method also discriminate between paracellular and transcellular migration modes? In particular for T-cells this is known to happen.

      We thank the Reviewer for this suggestion. We have added this to the potential applications of UFMTrack in the discussion section. While this differentiation is not feasible relying solely on the phasecontrast imaging data, UFMTrack can simplify this analysis by providing automatically the predictions of the transmigration locations, for analysis of the fluorescent data of the junctional labels.

      Reviewer #2 (Recommendations For The Authors):

      This paper develops an under-flow migration tracker to evaluate all the steps of the extravasation cascade of immune cells across the BBB. The algorithm is useful and has important applications. There are several points that need to be addressed, particularly about the claims made by the authors.

      Please see the comments below for more details:

      • Lines 88-92: Add a citation for the characteristics of the BBB as a barrier

      We have added two references accordingly.  

      • Lines 94-95: Can the authors indicate what models were used for these studies and how those compare to their in vitro model? In addition, can the authors say whether T cells were manually tracked in this study to translate results to the clinic and whether the results were successful when translated to the clinic? This may enhance the argument that automatic trackers are needed if the translation was not 100% successful

      This introductory paragraph summarizes in vivo and in vitro observations from several laboratories. Although these studies include manual tracking of T cells, they do not necessarily distinguish all sequential steps of the multi-step T cell transmigration cascade. Thus, automated tracking may provide additional insights, allowing for increased translation of findings to the clinic.  

      • Lines 96-98: Citing the work of Roger Kamm and Noo Li Jeon would be helpful here as they pioneered these BBB microfluidic models and have protocol papers on how to build them and how to use them for cancer cell extravasation studies. Roger Kamm has also worked on several extravasation studies with neutrophils, monocytes, and PBMCs from 3D vasculatures in microfluidic devices, under flow using pressurized fluid or recirculating pumps. Mentioning those would be helpful as they are directly related to what the authors are presenting in their paper.

      We thank the Reviewer for this comment, and we consider the work of Roger Kamm and Noo Li Jeon as very valuable for the field. However, these authors have focused on developing functional 3D microfluidic devices, including, e.g., all cells of the neurovascular unit which is not the focus of this present study that solely employed parallel flow chamber devices and endothelial monolayers.  

      • Lines 110-116: Can the authors comment on the use of ImageJ or similar automatic tracking tools and how these compare to the under-flow migration tracker developed in this paper? Several groups use ImageJ to track cellular migration successfully and in an automatic manner with short intervals between each frame. One paper that comes to mind is Chen et al: DOI: 10.1073/pnas.1715932115 where neutrophil migration in 3D was assessed with ImageJ in microfluidic devices of the vasculature. If the authors can highlight differences between their tool and what is currently available and used for automatic tracking (e.g. ImageJ), this would help in understanding the advantages of the migration tracker developed in this paper.

      • Lines 118-121: Add citations for the current state of the art for T cell extravasation tracking

      We thank the Reviewer for these suggestions. We have extended the introduction to add more details on the available tools for tracking migrating immune cells and their limitations, as well as the discussion section to emphasize the features unique to the developed UFMTrack framework.

      • Figure 1: The device used by the authors is considered to be a 2D microfluidic device with a monolayer of mouse brain endothelial cells. I would recommend the authors to carefully revise the claims made in the paper to mention that this is a 2D device as opposed to a 3D device, in order to not mislead readers who may be expecting these analyses to be performed in 3D vasculatures.

      We thank the Reviewer for this suggestion. We have included in the summary the mention of the 2dimensional nature of the employed BBB model.

      • Figure 1: The T cells used in this study are not fluorescently-labeled but the authors mention that this is an issue from current state-of-the-art tools. I would recommend that the authors remove this point as being an issue because it is not addressed in their paper. The T cells are also not labeled in this study so this limitation of other systems is not addressed in this paper.

      We apologize to the Reviewer as we do not understand this question. There will be many experimental conditions not allowing to study fluorescently tagged T cells. Therefore, UFMTrack is tailored to follow and analyze T cells and other immune cells during their interaction with endothelial monolayers independent of a fluorescence tag.  

      • Figure 1: Was the shear stress controlled manually with a syringe? Or with the use of a pressure controller? I would clarify this aspect and discuss human errors that can be introduced from manually controlling the pressure applied to the monolayer.

      We thank the Reviewer for pointing our attention to this ambiguity. We have added a mention of the automated syringe pump used to control the shear stress in the text where the values of shear stress applied to the sample are first mentioned.

      • Figure 1: Does T cell attachment occur within the first 5 minutes? Can the authors comment on how they chose this timeline and the percentage of T cells that are washed off at the second step at 1.5 dynes/cm^2? Is 30 seconds enough to ensure all the non-adhered T cells are washed off with 1.5 dyns/cm^2?

      Superfusion of the T cells over the endothelial monolayer is performed under 0.5 dynes/cm2 to allow the T cells to settle on the endothelial cell monolayer under flow. After increasing to physiological, flow non adherent T cells detach within 30 seconds, as described by the Reviewer. We have included in the Methods Section Point 7 the references describing in depth the design of the flow chamber device and methods used here.  

      • Line 154: How many images were used in the training vs. testing dataset for T cell migrations?

      We thank the Reviewer for pointing our attention to this missing information. We have added the sizes of the training and validation datasets. Specifically, the 226MPix of available imaging data was split into 154Mpix training and 37 MPix validation sets. The gap in between was introduced to avoid a correlation between validation and training set that would compromise the performance evaluation.

      • Are the supplementary videos at real speed or accelerated?

      We thank the Reviewer for pointing our attention to this missing information. The videos are sped up by a factor of 96. We have added this information to the Supplementary video descriptions.  

      • Lines 208 216: Can the authors comment on how their initial adhesion timeframe of 30sec before starting the recording at 5.5min affects the number of T cells with rapid displacement? 30 seconds may not be enough to ensure T cells have adhered to the endothelium

      Please see our comment above. The methodology used in the present assays has been set up and validated in numerous publications. We have included in the Methods Section under Point 7 the references describing in depth the design of the flow chamber device and the methods used here.  

      • Lines 275-277: Was the number of testing images 18? Can the authors comment on how this compares to training dataset size and whether these numbers are enough to achieve robust results?

      We apologize for this ambiguity in our manuscript. The framework was evaluated on 18 imaging datasets, each corresponding to 32 minutes of recording, not 18 images. We have added this clarification to the “CD4+ T cell analysis” subsection. The total size of these datasets is 18 datasets * 191 timeframe/dataset * 9.9MPix/frame = 34MPix

      • Figure 4B: Can the authors add statistics here? Individual datapoints on the error bars would be helpful too. 

      We thank the Reviewer for pointing our attention to this weakness. The data corresponds to the statistical errors as evaluated based on all cells in the 18 datasets. We have added the total number of cells in each of the endothelium stimulation conditions to the text.

      • Figure 4C-J: Can the authors put individual datapoints here as well and explain whether they considered each T cell to be one datapoint or each endothelium (averaging all T cells) to be one datapoint? 

      We thank the Reviewer for this suggestion. However, adding about one thousand points corresponding to each cell would be impractical. We thus present the distributions of the evaluated from the data metrics as a histogram on the violin plot instead of the swarm plot.

      • Figure 4: Did the authors wash the monolayers before introducing T cells? Soluble unbound cytokines may still be present and there are two different questions that would be studied here: “Is the inflamed endothelium affecting T cell migration?” (if washing was performed) or “Is T cell and microenvironmental inflammation affecting T cell migration?” (if no washing was performed)

      The endothelial monolayers are “washed” by starting the flow in the flow chamber device and this is before superfusing the T cells over the endothelial monolayer. We agree that our flow chamber device combined with UFMTrack will allow to address all these questions.

      • Figure 4I: Are all the T cells decelerating? (negative AM speed)

      We thank the Reviewer for this question. The cells are moving along the flow, which, in our experiments, is from left to right. The vector of speed is thus pointing against the x-axis, and thus the AM speed is negative.

      • Lines 302 306: Please explain how this compares to ImageJ or similar trackers that can achieve similar outputs. 

      We thank the Reviewer for this question. We have added a statement in the “T-cell tracking” section emphasizing that standard trackers are incapable of correctly capturing large displacements.

      • Lines 306-309: It is not lower for TNF stimulation though. How do the authors address this? TNF is also a pro-inflammatory cytokine.

      We have previously shown that stimulation of pMBMECs with IL-1 and TNF-a induces different cell surface levels of ICAM-1 and VCAM-1, which will influence T cell behavior on the pMBMEC monolayer.  

      • Lines 313-315: Could this be because the monolayer was not washed and soluble cytokines affected T cell response directly?

      Please see our answer to lines 306-309.  

      • Lines 319: Please cite Roger Kamm and Noo Li Jeon’s papers on BBB models with human BMECs, pericytes and astrocytes in 3D microfluidic devices.

      We thank the Reviewer again for pointing out these studies. As mentioned above, as our present study does not explore 3D models of the BBB, we think it does not fit into the framework of our study to elaborate on 3D models of the BBB. In addition, this would require the inclusion of a discussion of the work of others like, e.g., Peter Searson and others.  

      • Figure 5: Several statistics are missing from parts of the figure. Please add those.

      We apologize – but we do not understand which statistical analysis the Reviewer is missing from this Figure.  

      • Can the authors comment on the number of T cells perfused over the monolayer and if this ratio of T cells to endothelial cells makes physiological sense? Too many T cells may result in endothelium inflammation and increased diapedesis.

      The number of T cells used to suprerfuse over the endothelial monolayer is tested to avoid aggregation of T cells in suspension and thus artificial interactions with the endothelial monolayer. T cell behavior on the pMBMEC monolayer remains the same over the dilution of factor 10.  

      • Lines 381 383: How does this compare to analyses that look at the cross-section of the endothelium? It is difficult to assess transmigration looking at the top view of the endothelium. Perhaps, cross-section assessments will identify differences in manual vs. automatic tracking.

      There is, to the best of our knowledge, no microscopic device that would allow for in vitro live cell imaging of a live endothelial monolayer – this is in the presence of tissue culture medium – from the side at a resolution that would allow to define transmigration. Our current study rather shows the UFMTrack can distinguish cells moving above or below the endothelial monolayer.  

      • Figure 5J: This is probably the most important argument of the paper. If the authors can show statistical differences in their graph, this would greatly help convince readers that this tool is necessary and actually computationally efficient compared to manual work by researchers.

      We thank the Reviewer for this suggestion. However, comparing a single data point for automated measurement with four manual experimenter analysts is not a statistically sound comparison. We believe that Figure 5K is clearly showing the factor 5 difference in analysis speed as compared to manual analysis. More importantly, though, the automated analysis is taking the machine time, lifting the need for the experimenter to invest even 1/5th of the original analysis time.

      • Figure 6: Did the authors use autologous immune cells and endothelial cells? This is particularly relevant with the use of human-derived T cells (line 436) on the BMEC monolayer. Can the authors comment on non-self reactivity by the T cells encountering BMEC from another human subject?

      Autologous T cell interaction with BMECs would only be possible when using hiPSC-derived EECM-BMECs and the T cells from the same individual. All other experimental frameworks will not include autologous interactions. This is the experimental framework used by most authors studying immune cell interactions with commercially available donors. We have not studied alloreactive interactions in our assays and thus cannot further comment.  

      • Figure 6M,N,O: How does this compare to ImageJ for tracking of fluorescent cells? I recommend the authors to try that, at least for this section, as this may enhance their argument for their tool vs. standard tools like ImageJ if success rates are higher for their tool.

      We thank the Reviewer for this suggestion. We included a note on the analysis of the fluorescent datasets using the  TrackMate plugin for imageJ performed previously in our lab in the “Human T cells on immobilized recombinant BBB adhesion molecules” subsection.

      • Figure 6: Please put individual datapoints on the bar or violin plots where they are missing.

      We thank the Reviewer for this suggestion. However, adding about one thousand points corresponding to each cell would be impractical. We thus present the distributions of the evaluated from the data metrics as a histogram on the violin plot instead of the swarm plot.

      • Lines 467-471: This argument is important and should be mentioned earlier in the introduction.

      Another point that can be mentioned is the application of this platform to imaging modalities in vivo (mouse or human) given that there is no fluorescent staining in these cases. This review may be relevant: https://doi.org/10.1002/jcb.10454

      We thank the Reviewer for this suggestion. We have clarified in the introduction that UFMTrack does not require fluorescent labels of the imaged migrating cells and relies solely on the phase contrast imaging data.

      • Discussion: Please address a few more potential applications to this study. One can be cancer and immune infiltration.

      We thank the Reviewer for this suggestion. We have elaborated on additional potential applications to the discussion section.

      Reviewer #3 (Recommendations For The Authors):

      (1) Line 327-328: The authors talk about ‘As we have previously shown…pMBMEC monolayers differs between CD4+ and CD8+ cells…’. Where was this shown? If it was in a previously published article, please provide a reference.

      We have added these missing references.  

      (2) Line 353: Please provide clear location on where to find the associated information instead of stating ‘see below’.

      We thank the Reviewer for pointing our attention to this ambiguity. We have corrected the phrase to “see next paragraph”

      (3) Line 439: Please correct the acronym to BMECs

      We thank the Reviewer for pointing our attention to this typo. We have corrected it.

    1. eLife Assessment

      This important study describes mRNA shortening during cellular stress and interestingly observes that this shortening is dependent on localization in stress granules. Surprisingly, this mRNA shortening does not appear to require the shortening of poly A tails. These are novel, paradigm-shifting findings, using cutting-edge technologies and convincing data, that should be of broad interest to the RNA community and beyond.

    2. Reviewer #1 (Public Review):

      In this manuscript, the authors employed direct RNA sequencing with nanopores, enhanced by 5' end adaptor ligation, to comprehensively interrogate the human transcriptome at single-molecule and nucleotide resolution. They conclude that cellular stress induces prevalent 5' end RNA decay that is coupled to translation and ribosome occupancy. Contrary to the literature, they found that, unlike typical RNA decay models in normal conditions, stress-induced RNA decay is dependent on XRN1 but does not depend on the removal of the poly(A) tail. The findings presented are interesting and the authors fully established these paradigm-shifting findings using cutting-edge technologies.

    3. Reviewer #2 (Public Review):

      In the manuscript "Full-length direct RNA sequencing uncovers stress-granule dependent RNA decay upon cellular stress", Dar, Malla, and colleagues use direct RNA sequencing on nanopores to characterize the transcriptome after arsenite and oxidative stress. They observe a population of transcripts that are shortened during stress. The authors hypothesize that this shortening is mediated by the 5'-3' exonuclease XRN1, as XRN1 knockdown results in longer transcripts. Interestingly, the authors do not observe a polyA-tail shortening, which is typically thought to precede decapping and XRN1-mediated transcript decay. Finally, the authors use G3BP1 knockout cells to demonstrate that stress granule formation is required for the observed transcript shortening. The manuscript contains intriguing findings of interest to the mRNA decay community.

    4. Reviewer #3 (Public Review):

      The work by Dar et al. examines RNA metabolism under cellular stress, focusing on stress-granule-dependent RNA decay. It employs direct RNA sequencing with a Nanopore-based method, revealing that cellular stress induces prevalent 5' end RNA decay that is coupled to translation and ribosome occupancy but is independent of the shortening of the poly(A) tail. This decay, however, is dependent on XRN1 and enriched in the stress granule transcriptome. Notably, inhibiting stress granule formation in G3BP1/2-null cells restores the RNA length to the same level as wild-type. It suppresses stress-induced decay, identifying RNA decay as a critical determinant of RNA metabolism during cellular stress and highlighting its dependence on stress-granule formation. This is an exciting and novel discovery utilizing innovative sequencing methods to studying mRNA decay.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors employed direct RNA sequencing with nanopores, enhanced by 5' end adaptor ligation, to comprehensively interrogate the human transcriptome at singlemolecule and nucleotide resolution. They conclude that cellular stress induces prevalent 5' end RNA decay that is coupled to translation and ribosome occupancy. Contrary to the literature, they found that, unlike typical RNA decay models in normal conditions, stress-induced RNA decay is dependent on XRN1 but does not depend on the removal of the poly(A) tail. The findings presented are interesting but a substantial amount of work is needed to fully establish these paradigm-shifting findings.

      Strengths:

      These are paradigm-shifting observations using cutting-edge technologies.

      Weaknesses:

      The conclusions do not appear to be fully supported by the data presented.

      Our response to the reviewer comments is provided at the end of this document in the section "Recommendations For The Authors"

      Reviewer #2 (Public Review):

      In the manuscript "Full-length direct RNA sequencing uncovers stress-granule dependent RNA decay upon cellular stress", Dar, Malla, and colleagues use direct RNA sequencing on nanopores to characterize the transcriptome after arsenite and oxidative stress. They observe a population of transcripts that are shortened during stress. The authors hypothesize that this shortening is mediated by the 5'-3' exonuclease XRN1, as XRN1 knockdown results in longer transcripts. Interestingly, the authors do not observe a polyA-tail shortening, which is typically thought to precede decapping and XRN1-mediated transcript decay. Finally, the authors use G3BP1 knockout cells to demonstrate that stress granule formation is required for the observed transcript shortening.

      The manuscript contains intriguing findings of interest to the mRNA decay community. That said, it appears that the authors at times overinterpret the data they get from a handful of direct RNA sequencing experiments. To bolster some of the statements additional experiments might be desirable.

      A selection of comments:

      (1) Considering that the authors compare the effects of stress, stress granule formation, and XRN1 loss on transcriptome profiles, it would be desirable to use a single-cell system (and validated in a few more). Most of the direct RNAseq is performed in HeLa cells, but the experiments showing that stress granule formation is required come from U2OS cells, while short RNAseq data showing loss of coverage on mRNA 5'ends is reanalyzed from HEK293 cells. It may be plausible that the same pathways operate in all those cells, but it is not rigorously demonstrated.

      We agree with the reviewer that performing all experiments in a single cell system would be desirable. Presently, our core findings on 5’ RNA shortening are all performed in HeLa cells: the identification of 5’ RNA shortening, the reliance of shortening through XRN1 silencing, suppression of shortening by translation inhibition, and now the relationship between 5’ shortening and deadenylation/decapping through experiments described further below. Our use of other cell lines is primarily to show that 5’ shortening is a general phenomenon, and we have now done this for U20S cells, HEK293 cells, and primary 3T3 cells from mouse. 

      Regarding stress granule formation, we are unfortunately restricted by the lack of available wellcharacterized resources. The DDG3BP1/2 U2OS is a well characterized cell line that has been extensively used for stress granule-related experiments. We have therefore opted to use it and performed experiments to verify both the occurrence of stress-induced RNA shortening as well as the rescue in the absence of stress granules. The reproducibility and breadth of the cell lines used in our analysis makes us confident on the generality of our findings.

      (2) An interesting finding of the manuscript is that polyA tail shortening is not observed prior to transcript shortening. The authors would need to demonstrate that their approach is capable of detecting shortened polyA tails. Using polyA purified RNA to look at the status of polyA tail length may not be ideal (as avidity to oligodT beads may increase with polyA tail length and therefore the authors bias themselves to longer tails anyway). At the very least, the use of positive controls would be desirable; e.g. knockdown of CCR4/NOT.

      We thank the reviewer for their comment. Previous studies, using in vitro transcribed RNA molecules, have shown that direct RNA sequencing can capture and quantify poly(A) tails of varying lengths (Krause et al. 2019). Specifically, a range of 10 to 150 nt has been tested and a high concordance between known and dRNA-Seq determined values was observed. Both tailfindR and nanopolish (used in this work) showed high poly(A) tail estimation accuracy.

      Regardless, we agree with the reviewer that our method depends on poly(A) tail capture and thus may be incomplete for fully quantifying poly(A) length changes. We therefore opted to replace these data and instead follow this and other reviewers’ suggestions and perform experiments following knockdown of CCR4/NOT using cells expressing a catalytically inactive CNOT8 (CNOT8*) dominant negative mutant (Chang et al. 2019). Our new data show that stress-induced 5’ end decay is indeed not dependent on prior removal of the poly(A) tail. Specifically, we find that transcript shortening is still observed upon oxidative stress in cells expressing CNOT8* compared to control cells. We present these new results in Fig. 3 and Sup. Fig 3. 

      (3) The authors use a strategy of ligating an adapter to 5' phosphorylated RNA (presumably the breakdown fragments) to be able to distinguish true mRNA fragments from artifacts of abortive nanopore sequencing. This is a fantastic approach to curating a clean dataset. Unfortunately, the authors don't appear to go through with discarding fragments that are not adapter-ligated (presumably to increase the depth of analysis; they do offer Figure 1e that shows similar changes in transcript length for fragments with adapter, compared to Figure 1d). It would be good to know how many reads in total had the adapter. Furthermore, it would be good to know what percentage of reads without adapters are products of abortive sequencing. What percentage of reads had 5'OH ends (could be answered by ligating a different adapter to kinasetreated transcripts). More read curation would also be desirable when building the metagene analysis - why do the authors include every 3'end of sequenced reads (their RNA purification scheme requires a polyA tail, so non-polyadenylated fragments are recovered in a nonquantitative manner and should be discarded).

      We thank the reviewer for appreciating our approach. The reviewer is correct that we do not discard reads that are not adapter-ligated. As the reviewer correctly mentions this is to increase the sequencing depth. We have found that the ligation efficiency is very low, ~1-2 % of total reads (now in Sup. Table. 1), across all libraries, and so the percentage of REL5-ligated reads does not directly infer the total amount of non-artifactual 5’ ends. Instead, we use these REL5ligated reads as a subset of our data for which we have extremely high confidence in the true 5’end. Our results show that non-ligated reads display the same length distribution as ligated ones, and that the results are reproducible regardless of read selection (e.g. Fig. 1c, e, Sup. Fig. 1k, l, Fig. 3b, c). This strong concordance between REL5-ligated and non-ligated reads suggests that our conclusions on 5’ end shortening are not substantially influenced by abortive sequencing or other artefactual creation of 5’ shortening. We have modified the text to clarify these points and have added plots using only ligated molecules for relevant figures that this was not previously done (Sup. Fig 1l, 3c)

      We agree with the reviewer that non-polyadenylated reads could be discarded from metagene analysis and we have performed this change in the revised version. Our conclusions following removal of non-polyadenylated reads remain unchanged (Sup. Fig. 1g).

      (4) The authors should come to a clear conclusion about what "transcript shortening" means. Is it exonucleolytic shortening from the 5'end? They cannot say much about the 3'ends anyway (see above). Or are we talking about endonucleolytic cuts leaving 5'P that then can be attached by XRN1 (again, what is the ratio of 5'P and 5'OH fragments; also, what is the ratio of shortened to full-length RNA)?

      We thank the reviewer for their suggestion. We have performed additional experiments to investigate the role of deadenylation and decapping by expressing dominant negative forms of the NOT8 deadenylase (NOT8*) and DCP2 decapping (DCP2*) enzyme in HeLa cells. Our results show that neither expression of NOT8* nor DCP2* can inhibit stress-induced transcript shortening following arsenite treatment (Fig. 3e-f). These new data suggest that neither deadenylation nor decapping are required for stress-induced RNA decay. Instead, our data are more compatible with endonucleolytic cleavage as the most likely mechanism for stressinduced RNA decay. We have incorporated these results in the text and present them in Fig. 3 and Sup. Fig. 3.

      (5) The authors should clearly explain how they think the transcript shortening comes about. They claim it does not need polyA shortening, but then do not explain where the XRN1 substrate comes from. Does their effect require decapping? Or endonucleolytic attacks?

      Please also refer to our answer to the previous comment (#4). Collectively, our results from a) the dominant negative expression of NOT8* and DCP2* that show no effect on stress-induced shortening and b) the rescue of transcript length upon translation initiation inhibition, indicate a potential endonucleolytic mechanism as a mediator of stress-induced RNA decay. However, we believe that extensive, further studies currently beyond the scope of this work, will be required to discover the nuclease and to dissect the exact molecular mechanisms that define the 5' ends of mRNAs upon stress-induced decay. We now discuss these points in the discussion.

      (6) XRN1 KD results in lengthened transcripts. That is not surprising as XRN1 is an exonuclease - and XRN1 does not merely rescue arsenite stress-mediated transcript shortening, but results in a dramatic transcript lengthening.

      The reviewer raises an intriguing point. Additional analysis of data has showed that in fact, in unstressed cells, XRN1 KD leads to modestly significant reduction in overall transcript length (Fig. 3b, c). This could possibly be the result of an accumulation of intermediate cleavage products normally expected to be degraded by XRN1 as previously described (Pelechano, Wei, and Steinmetz 2015; Ibrahim et al. 2018).

      Instead, we find that under stress, XRN1 KD shows an almost identical transcript length distribution to unstressed cells and significantly higher than siCTRL stressed cells (Fig. 3b, c). These results indicate that in the absence of XRN1, stress-induced decay is largely abolished. As the reviewer correctly points out, this seems to affect the majority of RNAs which we believe is evidence of the general lack of specificity in the mechanism. Nevertheless, we find that transcripts that are the primary substrates to stress-induced shortening are substantially more lengthened than all other transcripts (Fig. 3e). This indicates that transcripts primarily affected by stress-induced decay are also lengthened the most in the absence of XRN1 and at an even higher level than expected by general XRN1 KD effects.

      Reviewer #3 (Public Review):

      The work by Dar et al. examines RNA metabolism under cellular stress, focusing on stressgranule-dependent RNA decay. It employs direct RNA sequencing with a Nanopore-based method, revealing that cellular stress induces prevalent 5' end RNA decay that is coupled to translation and ribosome occupancy but is independent of the shortening of the poly(A) tail. This decay, however, is dependent on XRN1 and enriched in the stress granule transcriptome. Notably, inhibiting stress granule formation in G3BP1/2-null cells restores the RNA length to the same level as wild-type. It suppresses stress-induced decay, identifying RNA decay as a critical determinant of RNA metabolism during cellular stress and highlighting its dependence on stress-granule formation.

      This is an exciting and novel discovery. I am not an expert in sequencing technologies or sequencing data analysis, so I will limit my comments purely to biology and not technical points. The PI is a leader in applying innovative sequencing methods to studying mRNA decay.

      One aspect that appeared overlooked is that poly(A) tail shortening per se does lead to decapping. It is shortening below a certain threshold of 8-10 As that triggers decapping. Therefore, I found the conclusion that poly(A) tail shortening is not required for stress-induced decay to be somewhat premature. For a robust test of this hypothesis, the authors should consider performing their analysis in conditions where CNOT7/8 is knocked down with siRNA.

      We agree with the reviewer. We have now performed experiments in cells expressing a well characterized catalytically inactive dominant negative NOT8 isoform (NOT8*) (Chang et al.

      2019). Our new data show that stress-induced decay still occurs in cells expressing NOT8*.

      These results confirm our findings that stress-induced decay does not require deadenylation. We present these new results in Fig. 3 and Sup. Fig. 3. 

      Similarly, as XRN1 requires decapping to take place, it necessitates the experiment where a dominant-negative DCP2 mutant is over-expressed.

      We agree with the reviewer and have performed this experiment as requested. Expression of a dominant negative DCP2 (DCP2*) isoform (Loh, Jonas, and Izaurralde 2013) in HeLa cells showed that decapping is also not required for stress-induced decay. We present these new results in Fig. 3 and Sup. Fig. 3.

      Are G3BP1/2 stress granules required for stress-induced decay or simply sites for storage? This part seems unclear. A very worthwhile test here would be to assess in XRN1-null background.

      We thank the reviewer for their comment. Our data show that stress-induced decay is not observed in DDG3BP1/2 U2OS cells, unable to form stress granules (Fig. 6). This result suggests that G3BP1/2 SGs are either a) required for 5’ RNA shortening or b) preserve partially fragmented RNAs that would otherwise be rapidly degraded. We find the second option unlikely for two reasons. First, even if the fragments were rapidly degraded, we would still expect to find evidence of their presence in our data. However, Fig. 6f shows that the length distribution of DDG3BP1/2 U2OS cells, with and without arsenite, are almost identical, thus arguing against the presence of such a pool of rapidly degrading RNAs. Second, if these RNAs were protected by SGs, then they would be expected to be downregulated in the absence of SGs in DDG3BP1/2 U2OS cells treated with arsenite. Our results contradict this hypothesis as no association is found between the level of downregulation in arsenite-treated DDG3BP1/2 U2OS cells and the observed stress-induced fragmentation in WT. Collectively our results point towards G3BP1/2 stress granules being required for stress-induced decay. We have expanded on these points in the manuscript to clarify.

      Finally, the authors speculate that the mechanism of stress-induced decay may have evolved to relieve translational load during stress. But why degrade the 5' end when removing the cap may be sufficient? This returns to the question of assessing the role of decapping in this mechanism.

      The reviewer raises a very interesting point. Our new results, following expression of dominant negative DCP2, show that stress-induced decay does not require decapping. It is therefore plausible that a stress-induced co-translational mechanism cleaves mRNAs endonucleolyticaly to reduce the translational load. Such a mechanism would have many functional benefits as it would acutely reduce the translational load, degrade non-essential RNAs, preserve energy and release ribosomes for translation of the stress response program. We have expanded the discussion to mention these points.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      As you can see from the comments, although the reviewers appreciate the novelty of your findings, there was a consensus opinion from all reviewers that the authors overinterpreted their data, since they only have one assay and did not fully analyze it, as laid out in one of the reviewer's critiques. Some orthogonal validation of the "groundbreaking" claims is necessary. Examination of the effects of upstream events in 5'-to-3' decay, namely deadenylation, and decapping, would be necessary for a better understanding of the phenomena the authors describe. Many tools and approaches for studying this are described well in the literature (CNOT7-KD, dominant negative DCP2 E148Q, XRN1-null cell lines), so it is well within the authors' reach. Overall, while some of the evidence presented is novel and solid, for some of the claims there is only incomplete evidence.

      We thank the reviewers and the editor for their comments and suggestions. We have performed several additional experiments to further support our conclusions. We have notably investigated the role of deadenylation and decapping in the stress-induced decay by expressing dominant negative NOT8 and DCP2, respectively, as suggested. Our results show that neither deadenylation nor decapping is necessary for stress-induced transcript shortening, suggesting an endonucleolytic event. We believe that these additional experiments strengthen the main conclusions of our work. 

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) The experiments were conducted in two unrelated cell lines, HeLa and U2OS. The authors should determine if the 5'end RNA decay in response to stress is also observed in normal human cells such as normal human diploid fibroblasts. Furthermore, it would be important to know if this mechanism is conserved between human and mouse cells. This can be tested in mouse embryonic fibroblasts.

      We thank the reviewer for their suggestion. We have now also performed experiments in the mouse embryonic fibroblast NIH 3T3 cell line. Our new results confirm that stress-induced 5’ end RNA decay is also observed in this primary cell line and is conserved between human and mouse (Sup. Fig. 1k, I). 

      (2) The authors state that they monitored cell viability up to 24 hours after Arsenite treatment, but the data is shown up to 240 min (Suppl. 1a). Also, the Y-axis label of this Figure is "Active cells (%)". This should be changed to "Live cells (%)" if this is what they are referring to.

      We thank the reviewer for identifying this mistake. Cell viability was monitored up to 4 hours after arsenite treatment. We have corrected the text and modified the figure according to the reviewer’s suggestion.

      (3) Based on direct Nanopore-based RNA-seq the authors surprisingly found that RNAs in oxidative stress were globally shorter than unstressed cells. Since Nanopore-based RNA-seq will not detect RNAs that lack a poly A-tail, are they not missing out on RNAs that have already started getting degraded due to the loss of a poly A-tail? Also, I am not sure if they used a spikein control which would be critical to claim global changes in RNA expression.

      We agree with the reviewer that our strategy does not capture RNA molecules without a poly(A) tail. Nevertheless, our data do identify shortening upon stress at the 5’ end of RNAs that include poly(A) tails. We considered this as direct evidence that decay at the 5’ end does not require prior removal of the poly(A) tail. Otherwise, these molecules would not have been captured and observed. Indeed, our newly added data from cells expressing a well characterized catalytically inactive dominant negative NOT8 isoform (Chang et al. 2019) show that stress-induced decay occurs even upon silencing of the CCR4-NOT deadenylation complex. We present these results in Fig. 3 and Sup. Fig 3.

      We would like to clarify that in our results we did not use a spike-in control and thus refrain from claiming global changes in RNA expression. Instead, we compare relative ratios of groups of molecules within libraries that are internally normalized, we perform correlative comparisons that are invariant to normalization and we perform differential gene expression using established normalization schemes such as DESeq2 (Love, Huber, and Anders 2014). 

      (4) Many graphs are confusing and inconsistent. For example, samples for Nanopore RNA-seq were prepared in triplicates. Biological or technical? The schematic in Figure 1a shows ISRIB but it appears from Figure 4 onwards. It is missing in the Figure 1 results and the Figure legend. The X-axis labels of many graphs are confusing. For example, Supplementary Figure 1d, 1e, 1g and 1h. It says transcript length but are these nucleotides? P-values are missing from many of these graphs. For some graphs, the authors compared Unstressed vs Arsenite (Figure 1), but in other panels they state No Ars vs 0.5 mM Ars (Fig. 3a) or Control vs Ars (Figure 5c). Likewise, in Figure 1b, Expression change (log2) is unstressed vs Arsenite or Arsenite vs unstressed?

      We thank the reviewer identifying these inconsistencies in the presentation of our results. The replicates for nanopore RNA-seq experiments were biological. We have now clarified this point in the text. Furthermore, we have removed “ISRIB” from Fig. 1a to avoid any confusion. We have also made our labelling across all figures more consistent using ‘unstressed’ for NO arsenite treatment vs “arsenite” or ‘+ Ars’ for arsenite treatment. 

      (5) The authors transfected cells with siCTRL or siXRN1 using electroporation and treated the cells 72 hours after transfection. Since XRN1 is an essential gene, it would be important to determine the viability of cells 72 hours after transfection. Along these lines, in Figure 3b, it would be important to determine the effect of XRN1 knockdown in unstressed cells. Currently, there are only 3 comparisons in Figure 3b - unstressed, siCTRL + Ars and siXRN1 + Ars, and this is insufficient to conclude the effects of XRN1 knockdown in the presence of Arsenite.

      We thank the reviewer for their suggestion. We have updated Fig. 3b and the text to show the requested conditions: siCTRL and siXRN1 with and without arsenite. While XRN2 is an essential gene for many organisms, XRN1 is not essential in mammalian cells and no increased cell death has been reported for XRN1-KO or –KD cells (Brothers et al. 2023). We have also tested different concentration (up to 40 nM) of siRNA and monitored the cells up to five days after transfection without observing any cell toxicity, as previously reported.

      (6) More broadly, the whole study is somewhat descriptive. The biological effect of 5'end mRNA shortening on gene expression is unclear. There is no data indicating how these changes in RNA lengths impact protein expression. Global quantitative proteomics would be critical to determine this.

      We thank the reviewer for their suggestion. To address this concern we have performed additional experiments using cells expressing catalytically inactive forms of NOT8 (Chang et al. 2019) and DCP2 (Loh, Jonas, and Izaurralde 2013) to inhibit deadenylation and decapping.

      These experiments provide additional mechanistic details for 5’ shortening and suggest endonucleolytic cleavage as a critical step (Fig. 3 and Sup. Fig. 3). We agree that it would be interesting to study the fate of these shortened transcripts notably regarding translation. However, given the complexity of the expected proteome changes also following global translation arrest under stress (Harding et al., 2003; Pakos-Zebrucka et al., 2016), we think that this work is beyond the scope of this manuscript and will be the subject of future studies. 

      Minor comments:

      (1) Some of the affected RNAs can be validated in HeLa and other cell lines.

      We thank the reviewer for their suggestion. We have performed RT-qPCR on 3 different mRNAs that present 5’ shortening upon oxidative stress using different primers located along the mRNA. We hypothesized that the closer the primer set is located to the 5’ end, the less abundant the corresponding region would be for arsenite-treated compared to untreated cells. Our results show indeed that the measured level of these mRNAs depends on the location of the primer sets used for the qPCR, the closer to the 5’end it is, the less abundant the mRNA is upon oxidative stress compared to control cells. We present these data as well as a schematic representing the positions of the primers in Sup. Fig. 2d. 

      (2) The authors should check whether XRN1 also co-localizes in SGs.

      We thank the reviewer for their suggestion. We have performed immunofluorescence on U2OS and HeLa upon oxidative stress and did not observe a co-localization of XRN1 with TIA-1, a marker of stress granules (see below). These results are consistent with (Kedersha et al. 2005) that have shown that XRN1 mainly co-localizes to processing bodies and are very weakly detectable in SGs in DU145 cells. We think that this result is beyond the scope of this study and thus decided to only include it for the reviewers.

      Author response image 1.

      Representative immunofluorescence merged image of HeLa (left panel) and U2OS (right panel) cells treated with sodium arsenite and labelled with anti-TIA1 (red), anti-XRN1 (green) antibodies and DAPI (blue). Scale bar 50 µm.

      (3) XRN1 should be knocked down with more than one siRNA.

      We thank the reviewer for this suggestion. Our results show that our XRN1 KD specifically rescues the length of the most shortened mRNAs (Fig. 3e). This is a highly specific effect that makes us confident it is not mediated by non-specific siRNA binding; thus, we do not consider it necessary to repeat the experiment.

      (4) There are typos in the text regarding Figure 6d, e, and f. Also, Supplementary Figure 4a.

      We thank the reviewer for identifying these mistakes. We have corrected the typos. 

      Reviewer #3 (Recommendations For The Authors):

      The authors should consider testing their hypotheses by arresting the decay pathway using the approaches I mentioned previously. As it stands, some conclusions are somewhat speculative.

      We have replied to the reviewer comments in the public review section. 

      References:

      • Brothers, William R., Farah Ali, Sam Kajjo, and Marc R. Fabian. 2023. “The EDC4-XRN1 Interaction Controls P-Body Dynamics to Link MRNA Decapping with Decay.” The EMBO Journal, August, e113933.

      • Chang, Chung-Te, Sowndarya Muthukumar, Ramona Weber, Yevgen Levdansky, Ying Chen, Dipankar Bhandari, Catia Igreja, Lara Wohlbold, Eugene Valkov, and Elisa Izaurralde. 2019. “A Low-Complexity Region in Human XRN1 Directly Recruits Deadenylation and Decapping Factors in 5’-3’ Messenger RNA Decay.” Nucleic Acids Research 47 (17): 9282–95.

      • Harding, Heather P., Yuhong Zhang, Huiquing Zeng, Isabel Novoa, Phoebe D. Lu, Marcella Calfon, Navid Sadri, et al. 2003. “An Integrated Stress Response Regulates Amino Acid Metabolism and Resistance to Oxidative Stress.” Molecular Cell 11 (3): 619–33.

      • Ibrahim, Fadia, Manolis Maragkakis, Panagiotis Alexiou, and Zissimos Mourelatos. 2018. “Ribothrypsis, a Novel Process of Canonical MRNA Decay, Mediates Ribosome-Phased MRNA Endonucleolysis.” Nature Structural & Molecular Biology 25 (4): 302–10.

      • Kedersha, Nancy, Georg Stoecklin, Maranatha Ayodele, Patrick Yacono, Jens Lykke-Andersen, Marvin J. Fritzler, Donalyn Scheuner, Randal J. Kaufman, David E. Golan, and Paul Anderson. 2005. “Stress Granules and Processing Bodies Are Dynamically Linked Sites of MRNP Remodeling.” The Journal of Cell Biology 169 (6): 871–84.

      • Krause, Maximilian, Adnan M. Niazi, Kornel Labun, Yamila N. Torres Cleuren, Florian S. Müller, and Eivind Valen. 2019. “Tailfindr: Alignment-Free Poly(A) Length Measurement for Oxford Nanopore RNA and DNA Sequencing.” RNA  25 (10): 1229–41.

      • Loh, Belinda, Stefanie Jonas, and Elisa Izaurralde. 2013. “The SMG5-SMG7 Heterodimer Directly Recruits the CCR4-NOT Deadenylase Complex to MRNAs Containing Nonsense Codons via Interaction with POP2.” Genes & Development 27 (19): 2125–38.

      • Love, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2.” Genome Biology 15 (12): 550.

      • Pakos-Zebrucka, Karolina, Izabela Koryga, Katarzyna Mnich, Mila Ljujic, Afshin Samali, and Adrienne M. Gorman. 2016. “The Integrated Stress Response.” EMBO Reports 17 (10): 1374–95.

      • Pelechano, Vicent, Wu Wei, and Lars M. Steinmetz. 2015. “Widespread Co-Translational RNA Decay Reveals Ribosome Dynamics.” Cell 161 (6): 1400–1412.

    1. eLife Assessment

      This valuable study has the potential to shed mechanistic light on how attention mechanisms that influence competition between multiple visual stimuli are modulated by the relative neural similarity of these stimuli. The study provides convincing data that will also be used for future modeling efforts. The study will be of interest to researchers working on the neural basis of visual attention.

    2. Reviewer #1 (Public review):

      Summary:

      The authors report an fMRI investigation of the neural mechanisms by which selective attention allows capacity-limited perceptual systems to preferentially represent task-relevant visual stimuli. Specifically, they examine competitive interactions between two simultaneously-presented items from different categories, to reveal how task-directed attention to one of them modulates the activity of brain regions that respond to both. The specific hypothesis is that attention will bias responses to be more like those elicited by the relevant object presented on its own, and further that this modulation will be stronger for more dissimilar stimulus pairs. This pattern was confirmed in univariate analyses that measured the mass response of a priori regions of interest, as well as multivariate analyses that considered the patterns of evoked activity within the same regions. The authors follow these neuroimaging results with a simulation study that favours a "tuning" mechanism of attention (enhanced responses to highly effective stimuli, and suppression for ineffective stimuli) to explain this pattern.

      Strengths:

      The manuscript clearly articulates a core issue in the cognitive neuroscience of attention, namely the need to understand how limited perceptual systems cope with complex environments in the service of the observer's goals. The use of a priori regions of interest (and a control region), and the inclusion of both univariate and multivariate analyses as well as a simple model, are further strengths. The authors carefully derive clear indices of attentional effects (for both univariate and multivariate analyses) which makes explication of their findings easy to follow.

      Weaknesses:

      Direct estimation of baseline responses may have improved the validity of the modelling. The presentation of transparently overlapping items has some methodological advantages, but somewhat limits the ecological validity of connections to real-world visual "clutter".

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors report an fMRI investigation of the neural mechanisms by which selective attention allows capacity-limited perceptual systems to preferentially represent task-relevant visual stimuli. Specifically, they examine competitive interactions between two simultaneously-presented items from different categories, to reveal how task-directed attention to one of them modulates the activity of brain regions that respond to both. The specific hypothesis is that attention will bias responses to be more like those elicited by the relevant object presented on its own, and further that this modulation will be stronger for more dissimilar stimulus pairs. This pattern was confirmed in univariate analyses that measured the mass response of a priori regions of interest, as well as multivariate analyses that considered the patterns of evoked activity within the same regions. The authors follow these neuroimaging results with a simulation study that favours a "tuning" mechanism of attention (enhanced responses to highly effective stimuli, and suppression for ineffective stimuli) to explain this pattern.

      Strengths:

      The manuscript clearly articulates a core issue in the cognitive neuroscience of attention, namely the need to understand how limited perceptual systems cope with complex environments in the service of the observer's goals. The use of a priori regions of interest, and the inclusion of both univariate and multivariate analyses as well as a simple model, are further strengths. The authors carefully derive clear indices of attentional effects (for both univariate and multivariate analyses) which makes explication of their findings easy to follow.

      Weaknesses:

      There are some relatively minor weaknesses in presentation, where the motivation behind some of the procedural decisions could be clearer. There are some apparently paradoxical findings reported -- namely, cases in which the univariate response to pairs of stimuli is greater than to the preferred stimulus alone -- that are not addressed. It is possible that some of the main findings may be attributable to range effects: notwithstanding the paradox just noted, it seems that a floor effect should minimise the range of possible attentional modulation of the responses to two highly similar stimuli. One possible limitation of the modelled results is that they do not reveal any attentional modulation at all under the assumptions of the gain model, for any pair of conditions, implying that as implemented the model may not be correctly capturing the assumptions of that hypothesis.

      We thank the reviewer for the constructive comments. In response, in the current version of the manuscript we have improved the presentation. We further discuss how the response in paired conditions is in some cases higher than the response to the preferred stimulus in this letter. For this, we provide a vector illustration, and a supplementary figure of the sum of weights to show that the weights of isolated-stimulus responses for each category pair are not bound to the similarity of the two isolated responses.

      Regarding the simulation results, we have clarified that the univariate effect of attention is not the attentional modulation itself, but the change in the amount of attentional modulation in the two paired conditions. We provide an explanation for this in this letter below, and have changed the term “attentional modulation” to “univariate shift” in the manuscript to avoid the confusion.

      Reviewer #2 (Public Review):

      Summary:

      In an fMRI study requiring participants to attend to one or another object category, either when the object was presented in isolation or with another object superimposed, the authors compared measured univariate and multivariate activation from object-selective and early visual cortex to predictions derived from response gain and tuning sharpening models. They observed a consistent result across higher-level visual cortex that more-divergent responses to isolated stimuli from category pairs predicted a greater modulation by attention when attending to a single stimulus from the category pair presented simultaneously, and argue via simulations that this must be explained by tuning sharpening for object categories.

      Strengths:

      - Interesting experiment design & approach - testing how category similarity impacts neural modulations induced by attention is an important question, and the experimental approach is principled and clever.

      - Examination of both univariate and multivariate signals is an important analysis strategy.

      - The acquired dataset will be useful for future modeling studies.

      Weaknesses:

      - The experimental design does not allow for a neutral 'baseline' estimate of neural responses to stimulus categories absent attention (e.g., attend fixation), nor of the combination of the stimulus categories. This seems critical for interpreting results (e.g., how should readers understand univariate results like that plotted in Fig. 4C-D, where the univariate response is greater for 2 stimuli than one, but the analyses are based on a shift between each extreme activation level?).

      We are happy to clarify our research rationale. We aimed to compare responses in paired conditions when the stimuli were kept constant while varying the attentional target. After we showed that the change in the attentional target resulted in a response change , we compared the amount of this response change to different stimulus category pairs to investigate the effect of representation similarity between the target and the distractor on the response modulation caused by attentional shift. While an estimate of the neural responses in the absence of attention might be useful for other modeling studies, it would not provide us with more information than the current data to answer the question of this study.

      Regarding the univariate results in Fig. 4C-D (and other equivalent ROI results in the revised version) and our analyses, we did not impose any limit on the estimated weights of the two isolated responses in the paired response and thus the sum of the two weights could be any number. We however see that the naming of “weighted average”, which implies a sum of weights being capped at one, has been misleading . We have now changed the name of this model to “linear combination” to avoid confusion

      Previous studies (Reddy et al., 2009, Doostani et al., 2023) using a similar approach have shown a related results pattern: the response to multiple stimuli is higher than the average, but lower than the sum of the isolated responses, which is exactly what our results suggest. We have added discussion on this topic in the Results section in lines 409-413 for clarification:

      “Note that the response in paired conditions can be higher or lower than the response to the isolated more preferred stimulus (condition Mat), depending on the voxel response to the two presented stimuli, as previously reported (Doostani et al. 2023). This is consistent with previous studies reporting the response to multiple stimuli to be higher than the average, but lower than the sum of the response to isolated stimuli (Reddy et al. 2009).”

      We are not sure what the reviewer means by “each extreme activation level”. Our analyses are based on all four conditions. The two isolated conditions are used to calculate the distance measures and the two paired conditions are used for calculating the shift index. Please note that either the isolated or the paired conditions could show the highest response and we seeboth cases in our data. For example, as shown in Figure 4A in EBA, the isolated Body condition and the paired BodyatCar condition show the highest activation levels for the Body-Car pair, whereas in Figure 4C, the two paired conditions (BodyatCat and BodyCatat) elicit the highest response.

      - Related, simulations assume there exists some non-attended baseline state of each individual object representation, yet this isn't measured, and the way it's inferred to drive the simulations isn't clearly described.

      We agree that the simulations assume a non-attended baseline state, and that we did not measure that state empirically. We needed this non-attended response in the simulations to test which attention mechanism led to the observed results. Thus, we generated the non-attended response using the data reported in previous neural studies of object recognition and attention in the visual cortex (Ni et al., 2012, Bao and Tsao, 2018). Note that the simulations are checking for the profile of the modulations based on category distance. Thus, they do not need to exactly match the real isolated responses in order to show the effect of gain and tuning shift on the results. We include the clarification and the range of neural responses and attention parameters used in the simulations in the revised manuscript in lines 327-333:

      “To examine which attentional mechanism leads to the effects observed in the empirical data, we generated the neural response to unattended object stimuli as a baseline response in the absence of attention, using the data reported by neural studies of object recognition in the visual cortex (Ni et al., 2012, Bao and Tsao, 2018). Then, using an attention parameter for each neuron and different attentional mechanisms, we simulated the response of each neuron to the different task conditions in our experiment. Finally, we assessed the population response by averaging neural responses.”

      - Some of the simulation results seem to be algebraic (univariate; Fig. 7; multivariate, gain model; Fig. 8)

      This is correct. We have used algebraic equations for the effect of attention on neural responses in the simulations. In fact, thinking about the two models of gain and tuning shift leads to the algebraic equations, which in turn logically leads to the observed results, if no noise is added to the data. The simulations are helpful for visualizing these logical conclusions. Also, after assigning different noise levels to each condition for each neuron, the results are not algebraic anymore which is shown in updated Figure 7 and Figure 8.

      - Cross-validation does not seem to be employed - strong/weak categories seem to be assigned based on the same data used for computing DVs of interest - to minimize the potential for circularity in analyses, it would be better to define preferred categories using separate data from that used to quantify - perhaps using a cross-validation scheme? This appears to be implemented in Reddy et al. (2009), a paper implementing a similar multivariate method and cited by the authors (their ref 6).

      Thank you for pointing out the missing details about how we used cross-validation. In the univariate analysis, we did use cross validation, defining preferred categories and calculating category distance on one half of the data and calculating the univariate shift on the other half of the data. Similarly, we employed cross-validation for the multivariate analysis by using one half of the data to calculate the multivariate distance between category pairs, and the other half of the data to calculate the weight shift for each category pair. We have now added this methodological information in the revised manuscript.

      - Multivariate distance metric - why is correlation/cosine similarity used instead of something like Euclidean or Mahalanobis distance? Correlation/cosine similarity is scale-invariant, so changes in the magnitude of the vector would not change distance, despite this likely being an important data attribute to consider.

      Since we are considering response patterns as vectors in each ROI, there is no major difference between the two measures for similarity. Using euclidean distance as a measure of distance (i.e. inverse of similarity) we observed the same relationship between weight shift and category euclidean distance. There was a positive correlation between weight shift and the euclidean category distance in all ROIs ( ps < 0.01, ts > 2.9) except for V1 (p = 0.5, t = 0.66). We include this information in the revised manuscript in the Results section lines 513-515:

      “We also calculated category distance based on the euclidean distance between response patterns of category pairs and observed a similarly positive correlation between the weight shift and the euclidean category distance in all ROIs (ps < 0.01, ts >2.9) except V1 ( p = 0.5, t = 0.66).”

      - Details about simulations implemented (and their algebraic results in some cases) make it challenging to interpret or understand these results. E.g., the noise properties of the simulated data aren't disclosed, nor are precise (or approximate) values used for simulating attentional modulations.

      We clarify that the average response to each category was based on previous neurophysiology studies (Ni et al., 2012, Bao and Tsao, 2018). The attentional parameter was also chosen based on previous neurophysiology (Ni et al., 2012) and human fMRI (Doostani et al., 2023) studies of visual attention by randomly assigning a value in the range from 1 to 10. We have included the details in the Methods section in lines 357-366:

      “We simulated the action of the response gain model and the tuning sharpening model using numerical simulations. We composed a neural population of 4⨯105 neurons in equal proportions body-, car-, cat- or house-selective. Each neuron also responded to object categories other than its preferred category, but to a lesser degree and with variation. We chose neural responses to each stimulus from a normal distribution with the mean of 30 spikes/s and standard deviation of 10 and each neuron was randomly assigned an attention factor in the range between 1 and 10 using a uniform distribution. These values are comparable with the values reported in neural studies of attention and object recognition in the ventral visual cortex (Ni et al. 2012, Bao and Tsao 2018). We also added poisson noise to the response of each neuron (Britten et al. 1993), assigned randomly for each condition of each neuron.”

      - Eye movements do not seem to be controlled nor measured. Could it be possible that some stimulus pairs result in more discriminable patterns of eye movements? Could this be ruled out by some aspect of the results?

      Subjects were instructed to direct their gaze towards the fixation point. Given the variation in the pose and orientation of the stimuli, it is unlikely that eye movements would help with the task. Eye movements have been controlled in previous experiments with individual stimulus presentation (Xu and Vaziri-Pashkam, 2019) and across attentional tasks in which colored dots were superimposed on the stimuli (Vaziri-Pashkam and Xu, 2017) and no significant difference for eye movement across categories or conditions was observed. As such, we do not think that eye movements would play a role in the results we are observing here.

      - A central, and untested/verified, assumption is that the multivariate activation pattern associated with 2 overlapping stimuli (with one attended) can be modeled as a weighted combination of the activation pattern associated with the individual stimuli. There are hints in the univariate data (e.g., Fig. 4C; 4D) that this might not be justified, which somewhat calls into question the interpretability of the multivariate results.

      If the reviewer is referring to the higher response in the paired compared to the isolated conditions, as explained above, we have not forced any limit on the sum of the estimated weights to equal 1 or 2. Therefore, our model is an estimation of a linear combination of the two multivariate patterns in the isolated conditions. In fact, Leila Reddy et al. (reference 6) reported that while the combination is closer to a weighted average than to a weighted sum, the sum of the weights are on average larger than 1. In Figure 4C and 4D the responses in the paired conditions are higher than either of the isolated-condition responses. This suggests that the weights for the linear combination of isolated responses in the multivariate analysis should add up to larger than one. This is what we find in our results. We have added a supplementary figure to Figure 6, depicting the sum of weights for different category pairs in all ROIs. The figure illustrates that in each ROI, the sum of weights are greater than 1 for some category pairs. It is however noteworthy that we normalized the weights in each condition by the sum of weights to calculate the weight shift in our analysis. The amount of the weight shift was therefore not affected by the absolute value of the weights.

      - Throughout the manuscript, the authors consistently refer to "tuning sharpening", an idea that's almost always used to reference changes in the width of tuning curves for specific feature dimensions (e.g., motion direction; hue; orientation; spatial position). Here, the authors are assaying tuning to the category (across exemplars of the category). The link between these concepts could be strengthened to improve the clarity of the manuscript.

      The reviewer brings up an excellent point. Whereas tuning curves have been extensively used for feature dimensions such as stimulus orientation or motion direction, here, we used the term to describe the variation in a neuron’s response to different object stimuli.

      With a finite set of object categories, as is the case in the current study, the neural response in object space is discrete, rather than a continuous curve illustrated for features such as stimulus orientation. However, since more preferred and less preferred features (objects in this case) can still be defined, we illustrated the neural response using a hypothetical curve in object space in Figure 3 to show how it relates with other stimulus features. Therefore, here, tuning sharpening refers to the fact that the response to the more preferred object categories has been enhanced while the response to the less preferred stimulus categories is suppressed.

      We clarify this point in the revised manuscript in the Discussion section lines 649-659:

      “While tuning curves are commonly used for feature dimensions such as stimulus orientation or motion direction, here, we used the term to describe the variation in a neuron’s response to different object stimuli. With a finite set of object categories, as is the case in the current study, the neural response in object space is discrete, rather than a continuous curve illustrated for features such as stimulus orientation. The neuron might have tuning for a particular feature such as curvature or spikiness (Bao et al., 2020) that is present to different degrees in our object stimuli in a continuous way, but we are not measuring this directly. Nevertheless, since more preferred and less preferred features (objects in this case) can still be defined, we illustrate the neural response using a hypothetical curve in object space. As such, here, tuning sharpening refers to the fact that the response to the more preferred object categories has been enhanced while the response to the less preferred stimulus categories is suppressed.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      a. The authors should address the apparent paradox noted above (and report whether it is seen in other regions of interest as well). On what model would the response to any pair of stimuli exceed that of the response to the preferred stimulus alone? This implies some kind of Gestalt interaction whereby the combined pair generates a percept that is even more effective for the voxels in question than the "most preferred" one?

      The response to a pair of stimuli can exceed the response to each of the stimuli presented in isolation if the voxel is responsive to both stimuli and as long as the voxel has not reached its saturation level. This phenomenon has been reported in many previous studies (Zoccolan et al., 2005, Reddy et al., 2009, Ni et al., 2012, Doostani et al., 2023) and can be modeled using a linear combination model which does not limit the weights of the isolated responses to equal 1 (Doostani et al., 2023). Note that the “most preferred” stimulus does not necessarily saturate the voxel response, thus the response to two stimuli could be more effective based on voxel responsiveness to the second stimulus.

      As for the current study, the labels “more preferred” and “less preferred” are only relatively defined (as explained in the Methods section), meaning that the more preferred stimulus is not necessarily the most preferred stimulus for the voxels. Furthermore, the presented stimuli are semi-transparent and presented with low-contrast, which moves the responses further away from the saturation level. Based on reported evidence for multiple-stimulus responses, responses to single stimuli are in many cases sublinearly added to yield the multiple-stimulus response (Zoccolan et al., 2005, Reddy et al., 2009, Doostani et al., 2023). This means that the multiple-stimulus response is lower than the sum of the isolated responses and not lower than each of the isolated responses. Therefore, it is not paradoxical to observe higher responses in paired conditions compared to the isolated conditions. We observe similar results in other ROIs, which we provide as supplementary figures to Figure 4 in the revised manuscript.

      We address this observation and similar reports in previous studies in the Results section of the revised manuscript in lines 409-413:

      “Note that the response in paired conditions can be higher or lower than the response to the isolated more preferred stimulus (condition Mat), depending on the voxel preference for the two presented stimuli, as previously reported (Doostani et al., 2023). This is consistent with previous studies reporting the response to multiple stimuli to be higher than the average, but lower than the sum of the response to isolated stimuli (Reddy et al., 2009).”

      b. Paradox aside, I wondered to what extent the results are in part explained by range limits. Take two categories that evoke a highly similar response (either mean over a full ROI, or in the multivariate sense). That imposes a range limit such that attentional modulation, if it works the way we think it does, could only move responses within that narrow range. In contrast, the starting point for two highly dissimilar categories leaves room in principle for more modulation.

      We do not believe that the results can be explained by range limits because responses in paired conditions are not limited by the isolated responses, as can be observed in Figure 4. However, to rule out the possibility of the similarity between responses in isolated conditions affecting the range within which responses in paired conditions can change, we turned to the multivariate analysis. We used the weight shift measure as the change in the weight of each stimulus with the change in the attentional target. In this method, no matter how close the two isolated vectors are, the response to the pair could still have a whole range of different weights of the isolated responses. We have plotted an example illustration of two-dimensional vectors for better clarification. Here, the vectors Vxat and Vyat denote the responses to the isolated x and y stimuli, respectively, and the vector Pxaty denotes the response to the paired condition in which stimulus x is attended. The weights a1 and a2 are illustrated in the figure, which are equal to regression coefficients if we solve the equation Pxaty \= [a1 a2] [x y]’. While the weight values depend on the amplitude of and the angle between the three vectors, they are not limited by a lower angle between Vxat and Vyat.

      We have updated Figure 2 in the manuscript to avoid the confusion. We have also added a figure including the sum of weights for different category pairs in different regions, showing that the sum of weights are not dependent on the similarity between the two stimuli. The conclusions based on the weight shift are therefore not confounded by the similarity between the two stimuli.

      c. Finally, related to the previous point, while including V1 is a good control, I wonder if it is getting a "fair" test here, because the range of responses to the four categories in this region, in terms of (dis)similarity, seems compressed relative to the other categories.

      We believe that V1 is getting a fair test because the single-subject range of category distance in V1 is similar to LO, as can be observed Author response image 1_:_

      Author response image 1.

      Range of category distance in each ROI averaged across participants

      The reason that V1 is showing a more compressed distance range on the average plot is that the category distance in V1 is not consistent among participants. Although the average plots are shown in Figure 5 and Figure 6, we tested statistical significance in each ROI based on single-subject correlation coefficients.

      Please also note that a more compressed range of dissimilarity does not necessarily lead to a less strong effect of category distance on the effect of attention. For instance, while LO shows a more compressed dissimilarity range for the presented categories compared to the other object selective regions, it shows the highest correlation between weight shift and category distance. Furthermore, as illustrated in Figure 5, no significant correlation is observed between univariate shift and category distance in V1, even though the range of the univariate distance in V1 is similar to LO and pFs, where we observed a significant correlation between category distance and univariate shift.

      d. In general, the manuscript does a very good job explaining the methods of the study in a way that would allow replication. In some places, the authors could be clearer about the reasoning behind those methodological choices. For example: - How was the sample size determined?

      Estimating conservatively based on the smallest amount of attentional modulation we observed in a previous study (Doostani et al., 2023), we chose a medium effect size (0.3). For a power of 0.8, the minimum number of participants should be 16. We have added the explanation to the Methods section in lines 78-81:

      “We estimated the number of participants conservatively based on the smallest amount of attentional modulation observed in our previous study (Doostani et al., 2023). For a medium effect size of 0.3 and a power of 0.8, we needed a minimum number of 16 participants.”

      - Why did the authors choose those four categories? What was the evidence that would suggest these would span the range of similarities needed here?

      We chose these four categories based on a previous behavioral study reporting the average reaction time of participants when detecting a target from one category among distractors from another category (Xu and Vaziri-Pashkam, 2019). Ideally the experiment should include as many object categories as possible. However, since we were limited by the duration of the experiment, the number of conditions had to be controlled, leading to a maximum of 4 object categories. We chose two animate and two inanimate object categories to include categories that are more similar and more different based on previous behavioral results (Xu and Vaziri-Pashkam, 2019). We included body and house categories because they are both among the categories to which highly responsive regions exist in the cortex. We chose the two remaining categories based on their similarity to body and house stimuli. In this way, for each category there was another category that elicited similar cortical responses, and two categories that elicited different responses. While we acknowledge that the chosen categories do not fully span the range of similarities, they provide an observable variety of similarities in different ROIs which we find acceptable for the purposes of our study.

      We include this information in the Methods section of the revised manuscript in lines 89-94:

      “We included body and house categories because there are regions in the brain that are highly responsive and unresponsive to each of these categories, which provided us with a range of responsiveness in the visual cortex. We chose the two remaining categories based on previous behavioral results to include categories that provided us with a range of similarities (Xu and Vaziri-Pashkam, 2019). Thus, for each category there was a range of responsiveness in the brain and a range of similarity with the other categories.”

      - Why did the authors present the stimuli at the same location? This procedure has been adopted in previous studies, but of course, it does also move the stimulus situation away from the real-world examples of cluttered scenes that motivate the Introduction.

      We presented the stimuli at the same location because we aimed to study the mechanism of object-based attention and this experimental design helped us isolate it from spatial attention. We do not think that our design moves the stimulus situation away from real-world examples in such a way that our results are not generalizable. We include real-world instances, as well as a discussion on this point, in the Discussion section of the revised manuscript, in lines 611-620:

      “Although examples of superimposed cluttered stimuli are not very common in everyday life, they still do occur in certain situations, for example reading text on the cellphone screen in the presence of reflection and glare on the screen or looking at the street through a patterned window. Such instances recruit object-based attention which was the aim of this study, whereas in more common cases in which attended and unattended objects occupy different locations in space, both space-based and object-based attention may work together to resolve the competition between different stimuli. Here we chose to move away from usual everyday scenarios to study the effect of object-based attention in isolation. Future studies can reveal the effect of target-distractor similarity, i.e. proximity in space, on space-based attention and how the effects caused by object-based and space-based attention interact.”

      - While I'm not concerned about this (all relevant comparisons were within-participants) was there an initial attempt to compare data quality from the two different scanners?

      We compared the SNR values of the two groups of participants and observed no significant difference between these values (ps > 0.34, ts < 0.97). We have added this information to the Methods section.

      Regarding the observed effect, we performed a t-test between the results of the participants from the two scanners. For the univariate results, the observed correlation between univariate attentional modulation and category distance was not significantly different for participants of the two scanners in any ROIs (ps > 0.07 , ts < 1.9). For the multivariate results, the observed correlation between the weight shift and multivariate category distance was not significantly different in any ROIs (ps > 0.48 , ts < 0.71) except for V1 (p-value = 0.015 , t-value = 2.75).

      We include a sentence about the comparison of the SNR values in the preprocessing section in the revised manuscript.

      e. There are a couple of analysis steps that could be applied to the existing data that might strengthen the findings. For one, the authors have adopted a liberal criterion of p < 0.001 uncorrected to include voxels within each ROI. Why, and to what extent is the general pattern of findings robust over more selective thresholds? Also, there are additional regions that are selective for bodies (fusiform body area) and scenes (occipital place area and retrosplenial cortex). Including these areas might provide more diversity of selectivity patterns (e.g. different responses to non-preferred categories) that would provide further tests of the hypothesis.

      We selected this threshold to allow for selection of a reasonable number of voxels in each hemisphere across all participants. To check whether the effect is robust over more selective thresholds, we exemplarily redefined the left EBA region using p < 0.0001 and p < 0.00001 and observed that the weight shift effect remained equivalent. We have made a note of this analysis in the Results section. As for the additional regions suggested by the reviewer, we chose not to include them because they could not be consistently defined in both hemispheres of all participants. Please note that the current ROIs also show different responses to non-preferred categories (e.g. in LO and pFs). We include this information in the Methods section in lines 206-207:

      “We selected this threshold to allow for selection of a reasonable number of voxels in each hemisphere across all participants.”

      And in the Results section in lines 509-512:

      “We performed the analysis including only voxels that had a significantly positive GLM coefficient across the runs and observed the same results. Moreover, to check whether the effect is robust over more selective thresholds for ROI definition, we redefined the left EBA region with p < 0.0001 and p < 0.00001 criteria. We observed a similar weight shift effect for both criteria.”

      f. One point the authors might address is the potential effect of blocking the paired conditions. If I understood right, the irrelevant item in each paired display was from the same category throughout a block. To what extent might this knowledge shape the way participants attend to the task-relevant item (e.g. by highlighting to them certain spatial frequencies or contours that might be useful in making that particular pairwise distinction)? In other words, are there theoretical reasons to expect different effects if the irrelevant category is not predictable?

      We believe that the participants’ knowledge about the distractor does not significantly affect our results because our results are in agreement with previous behavioral data (Cohen et al., 2014, Xu and Vaziri-Pashkam, 2019), in which the distractor could not be predicted. These reports suggest there is a theoretical reason to expect similar effects if the participants could not predict the distractor. To directly test this, one would need to perform an fMRI experiment using an event-related design, an interesting venue for future research.

      We have made a note of this point in the Discussion section of the revised manuscript in lines 621-626:

      “Please note that we used a blocked design in which the target and distractor categories could be predicted across each block. While it is possible that the current design has led to an enhancement of the observed effect, previous behavioral data (Cohen et al., 2014, Xu and Vaziri-Pashkam, 2019) have reported the same effect in experiments in which the distractor was not predictable. To study the effect of predictability on fMRI responses, however, an event-related design is more appropriate, an interesting venue for future fMRI studies.”

      g. The authors could provide behavioural data as a function of the specific category pairs. There is a clear prediction here about which pairs should be more or less difficult.

      We provide the behavioral data as a supplementary figure to Figure 1 in the revised manuscript. We however do not see differences in behavior for the different category paris. This is so because our fMRI task was designed in a way to make sure the participants could properly attend to the target for all conditions. The task was rather easy across all conditions and due to the ceiling effect, there was no significant difference between behavioral performance for different category pairs. However, the effect of category pair on behavior has been previously tested and reported in a visual search paradigm with the same categories (Xu and Vaziri-Pashkam, 2019), which was in fact the basis for our choice of categories in this study (as explained in response to point “d” above).

      h. Figure 4 shows data for EBA in detail; it would be helpful to have a similar presentation of the data for the other ROIs as well.

      We provide data for all ROIs as figure supplements 1-4 to Figure 4 in the revised manuscript.

      i. For the pFs and LOC ROIs, it would be helpful to have an indication of what proportion of voxels was most/least responsive to each of the four categories. Was this a relatively even balance, or generally favouring one of the categories?

      In LO, the proportion of voxels most responsive to each of the four categories was relatively even for Body (31%) and House (32%) stimuli, which was higher than the proportion of Car- and Cat-preferring voxels (18% and 19%, respectively). In pFs, 40% of the voxels were house-selective, while the proportion was relatively even for voxels most responsive to bodies, cars, and houses with 21%, 17%, and 22% of the voxels, respectively. We include the percentage of voxels most responsive to each of the four categories in each ROI as Appendix 1-table 1.

      j. Were the stimuli in the localisers the same as in the main experiment?

      No, we used different sets of stimuli for the localizers and the main experiment. We have added the information in line 146 of the Methods section.

      Reviewer #2 (Recommendations For The Authors):

      (1) Why are specific ROIs chosen? Perhaps some discussion motivating these choices, and addressing the possible overlap between these and retinotopic regions (based on other studies, or atlases - Wang et al, 2015) would be useful.

      Considering that we used object categories, we decided to look at general object-selective regions (LO, pFS) as well as regions that are highly selective for specific categories (EBA, PPA). We also looked at the primary visual cortex as a control region. We have added this clarification in the Methods section lines 128-133:

      “Considering that we used object categories, we investigated five different regions of interest (ROIs): the object-selective areas lateral occipital cortex (LO) and posterior fusiform (pFs) as general object-selective regions, the body-selective extrastriate body area (EBA) and the scene-selective parahippocampal place area (PPA) as regions that are highly selective for specific categories, and the primary visual cortex (V1) as a control region. We chose these regions because they could all be consistently defined in both hemispheres of all participants and included a large number of voxels.”

      (2) The authors should consider including data on the relative prevalence of voxels preferring each category for each ROI (and/or the mean activation level across voxels for each category for each ROI). If some ROIs have very few voxels preferring some categories, there's a chance the observed results are a bit noisy when sorting based on those categories (e.g., if a ROI has essentially no response to a given pair of categories, then there's not likely to be much attentional modulation detectable, because the ROI isn't driven by those categories to begin with).

      We thank the reviewer for the insightful comment.

      We include the percentage of voxels most responsive to each of the four categories in each ROI in the Appendix ( Appendix 1-table 1, please see the answer to point “i” of the first reviewer).

      We also provide a table of average activity across voxels for each category in all ROIs as Appendix 1-table 2.

      As shown in the table, voxels show positive activity for all categories in all ROIs except for PPA, where voxels show no response to body and cat stimuli. This might explain why we observed a marginally significant correlation between weight shift and category distance in PPA only. As the reviewer mentions, since this region does not respond to body and cat stimuli, we do not observe a significant change in response due to the shift in attention for some pairs. We include the table in the Appendix and add the explanation to the Results section of the revised manuscript in lines 506-508:

      _“_Less significant results in PPA might arise from the fact that PPA shows no response to body and cat stimuli and little response to car stimuli (Appendix 1-table 2). Therefore, it is not possible to observe the effect of attention for all category pairs.”

      a. Related - would it make sense to screen voxels for inclusion in analysis based on above-basely activation for one or both of the categories? [could, for example, imagine you're accidentally measuring from the motor cortex - you'd be able to perform this analysis, but it would be largely nonsensical because there's no established response to the stimuli in either isolated or combined states].

      We performed all the analyses including only voxels that had a significantly positive GLM coefficient across the runs and the results remained the same. We have added the explanation in the Results section in line 509-510.

      (3) Behavioral performance is compared against chance level, but it doesn't seem that 50% is chance for the detection task. The authors write on page 4 that the 1-back repetition occurred between 2-3 times per block, so it doesn't seem to be the case that each stimulus had a 50% chance of being a repetition of the previous one.

      We apologize for the mistake in our report. We have reported the detection rate for the target-present trials (2-3 per block), not the behavioral performance across all trials. We have modified the sentence in the Results section.

      (4) Authors mention that the stimuli are identical for 2-stimulus trials where each category is attended (for a given pair) - but the cue is different, and the cue appears as a centrally-fixated word for 1 s. Is this incorporated into the GLM? I can't imagine this would have much impact, but the strict statement that the goals of the participant are the only thing differentiating trials with otherwise-identical stimuli isn't quite true.

      The word cue was not incorporated as a separate predictor into the GLM. As the reviewer notes, the signals related to the cue and stimuli are mixed. But given that the cues are brief and in the form of words rather than images, they are unlikely to have an effect on the response in the regions of interest.

      To be more accurate, we have included the clarification in the Methods section in lines 181-182:

      “We did not enter the cue to the GLM as a predictor. The obtained voxel-wise coefficients for each condition are thus related to the cue and the stimuli presented in that condition.”

      And in the Results section in lines 425-428 :

      “It is important to note that since the cue was not separately modeled in the GLM, the signals related to the cue and the stimuli were mixed. However, given that the cues were brief and presented in the form of words, they are unlikely to have an effect on the responses observed in the higher-level ROIs.”

      (5) Eq 5: I expected there to be some comparison of a and b directly as ratios (e.g., a_1 > b_1, as shown in Fig. 2). The equations used here should be walked through more carefully - it's very hard to understand what this analysis is actually accomplishing. I'm not sure I follow the explanation of relative weights given by the authors, nor how that maps onto the delta_W quantity in Equation 5.

      We provide a direct comparison of a and b, as well as a more thorough clarification of the analysis, in the Methods section in lines 274-276:

      “We first projected the paired vector on the plane defined by the isolated vectors (Figure 2A) and then determined the weight of each isolated vector in the projected vector (Figure 2B).”

      And in lines 286-297:

      “A higher a1 compared to a2 indicates that the paired response pattern is more similar to Vxat compared to Vyat, and vice versa. For instance, if we calculate the weights of the Body and Car stimuli in the paired response related to the simultaneous presentation of both stimuli, we can write in the LO region: VBodyatCar \= 0.81 VBody + 0.31 VCar, VBodyCarat \= 0.43 VBody + 0.68 VCar. Note that these weights are averaged across participants. As can be observed, in the presence of both body and car stimuli, the weight of each stimulus is higher when attended compared to the case when it is unattended. In other words, when attention shifts from body to car stimuli, the weight of the isolated body response (VBody) decreases in the paired response. We can therefore observe that the response in the paired condition is more similar to the isolated body response pattern when body stimuli are attended and more similar to the isolated car response pattern when car stimuli are attended.”

      And lines 303-306:

      “As shown here, even when body stimuli are attended, the effect of the unattended car stimuli is still present in the response, shown in the weight of the isolated car response (0.31). However, this weight increases when attention shifts towards car stimuli (0.68 in the attended case).”

      We also provide more detailed clarification for the 𝛥w and the relative weights in lines 309-324:

      “To examine whether this increase in the weight of the attended stimulus was constant or depended on the similarity of the two stimuli in cortical representation, we defined the weight shift as the multivariate effect of attention:

      𝛥w = a1/(a1+a2) – b1/(b1+b2)                                                                                          (5)

      Here, a1, a2, b1,and b2 are the weights of the isolated responses, estimated using Equation 4. We calculate the weight of the isolated x response once when attention is directed towards x (a1), and a second time when attention is directed towards y (b1). In each case, we calculate the relative weight of the isolated x in the paired response by dividing the weight of the isolated x by the sum of weights of x and y (a1+a2 when attention is directed towards x, and b1+b2 when attention is directed towards y). We then define the weight shift, Δw, as the change in the relative weight of the isolated x response in the paired response when attention shifts from x to y. A higher Δw for a category pair indicates that attention is more efficient in removing the effect of the unattended stimulus in the pair. We used relative weights as a normalized measure to compensate for the difference in the sum of weights for different category pairs. Thus, using the normalized measure, we calculated the share of each stimulus in the paired response. For instance, considering the Body-Car pair, the share of the body stimulus in the paired response was equal to 0.72 and 0.38, when body stimuli were attended and unattended, respectively. We then calculated the change in the share of each stimulus caused by the shift in attention using a simple subtraction ( Equation 5: Δw=0.34 for the above example of the Body-Car pair in LO) and used this measure to compare between different pairs.”

      We hope that this clarification makes it easier to understand the multivariate analysis and the weight shift calculation in Equation 5.

      We additionally provide the values of the weights (a1, b1, a2, and b2 ) for each category pair averaged across participants as Appendix 1 -table 4.

      (6) For multivariate analyses (Fig. 6A-E), x axis is normalized (pattern distance based on Pearson correlation), while the delta_W does not seem to be similarly normalized.

      We calculated ΔW by dividing the weights in each condition by the sum of weights in that condition. Thus, we use relative weights which are always in the range of 0 to 1, and ΔW is thus always in the range of -1 to 1. This means that both axes are normalized. Note that even if one axis were not normalized, the relationship between the independent and the dependent variables would remain the same despite the change in the range of the axis.

      (7) Simulating additional scenarios like attention to both categories just increasing the mean response would be helpful - is this how one would capture results like those shown in some panels of Fig. 4?

      We did not have a condition in which participants were asked to attend to both categories. Therefore it was not useful for our simulations to include such a scenario. Please also note that the goal of our simulations is not to capture the exact amount of attentional modulation, but to investigate the effect of target-distractor similarity on the change in attentional modulation (univariate shift and weight shift).

      As for the results in some panels of Figure 4, we have explained the reason underlying higher responses in paired conditions compared to isolated conditions) in response to the “weaknesses” section of the second reviewer. We hope that these points satisfy the reviewer’s concern regarding the results in Figure 4 and our simulations.

      (8) Lines 271-276 - the "latter" and "former" are backwards here I think.

      We believe that the sentence was correct, but confusing.. We have rephrased the sentence to avoid the confusion in lines 371-376 of the revised manuscript:

      “We modeled two neural populations: a general object-selective population in which each voxel shows preference to a particular category and voxels with different preferences are mixed in with each other (similar to LO and pFS), and a category-selective population in which all voxels have a similar preference for a particular category (similar to EBA and PPA).”

      (9) Line 314 - "body-car" pair is mentioned twice in describing the non-significant result in PPA ROI.

      Thank you for catching the typo. We have changed the second Body-Car to Body-Cat.

      (10) Fig. 5 and Fig. 6 - I was expecting to see a plot that demonstrated variability across subjects rather than across category pairs. Would it be possible to show the distribution of each pair's datapoints across subjects, perhaps by coloring all (e.g.) body-car datapoints one color, all body-cat datapoints another, etc? This would also help readers better understand how category preferences (which differ across ROIs) impact the results.

      We demonstrated variability across category pairs rather than subjects because we aimed to investigate how the variation in the similarity between categories (i.e. category distance) affected the univariate and multivariate effects of attention. The variability across subjects is reflected in the error bars in the bar plots of Figure 5 and Figure 6.

      Here we show the distribution of each category pair’s data points across subjects by using a different color for each pair:

      Author response image 2.

      Univariate shift versus category distance including single-subject data points in all ROIs.

      Author response image 3.

      Weight shift versus category distance including single-subject data points in all ROIs.

      As can be observed in the figures, category preference has little impact on the results. Rather, the similarity in the preference (in the univariate case) or the response pattern (in the multivariate case) to the two presented categories is what impacts the amount of the univariate shift and the weight shift, respectively. For instance, in EBA we observe a low amount of attentional shift both for the Body-Cat pair, with two stimuli for which the ROI is highly selective, and the Car-House pair, including stimuli to which the region shows little response. A similar pattern is observed in the object-selective regions LO and pFs which show high responses to all stimulus categories.

      We believe that the figures including the data points related to all subjects are not strongly informative. However, we agree that using different colors for each category pair helps the readers better understand that category preference has little impact on the results in different ROIs. We therefore present the colored version of Figure 5 and Figure 6 in the revised manuscript, with a different color for each category pair.

      (11) Fig. 5 and Fig. 6 use R^2 as a dependent variable across participants to conclude a positive relationship. While the positive relationship is clear in the scatterplots, which depict averages across participants for each category pair, it could still be the case that there are a substantial number of participants with negative (but predictive, thus high positive R^2) slopes. For completeness and transparency, the authors should illustrate the average slope or regression coefficient for each of these analyses.

      We concluded the positive relationship and calculated the significance in Figure 5 and Figure 6 using the correlation r rather than r.^2 This is why the result was not significantly positive in V1. We acknowledge that the use of r-squared in the bar plot leads to confusion. We have therefore changed the bar plots to show the correlation coefficient instead of the r-squared. Furthermore, we have added a table of the correlation coefficient for all participants in all ROIs for the univariate and weight shift analyses supplemental to Figure 5 and Figure 6, respectively.

      (12) No statement about data or analysis code availability is provided

      Thanks for pointing this out. The fMRI data is available on OSF. We have added a statement about it in the Data Availability section of the revised manuscript in line 669.

    1. Author response:

      We plan to provide full author responses and submit a revised version of our manuscript at the earliest opportunity.

    2. eLife Assessment

      This valuable study examines how different exercise training intensities affect intestinal barrier function and gut microbiota composition over a 6-week period in mice. The evidence supporting the main claims about exercise-induced intestinal injury and microbiota changes is solid, featuring comprehensive histological analysis, molecular characterization, and metabolomic profiling, though key mechanistic insights and causal relationships remain to be established. The findings have practical implications for understanding exercise-induced gastrointestinal stress, particularly the observation that daily moderate exercise may be more damaging to intestinal integrity than vigorous exercise with rest days. Additional experimental validation would strengthen these conclusions.

    3. Reviewer #1 (Public review):

      Summary:

      This article investigated the relationship between different intensities of exercise training and intestinal barrier dysfunction, and further explores the possible mechanisms, including the contribution of stress response, inflammatory response, gut microbiota alterations, and derived metabolites.

      Strengths:

      This article mainly focused on different aspects of the phenotypes and the morphology of intestinal barrier dysfunction induced by exercise training.

      Weaknesses:

      This article lacks the verification of the association of causality among various phenotypes and lacks a comprehensive understanding of the underlying mechanisms of how exercise contributes to intestinal barrier dysfunction.

      (1) For example, the author claimed that heat shock and ischemia are the causes of intestinal epithelial damage caused by exercise, and it is not only evidenced by detecting the expression of a few regulators, such as HSF and HSP70 after exercise; and by Immunohistochemical analysis of intestinal morphology and inflammation.

      (2) Many kinds of intestinal bacteria could produce short-chain fatty acids, such as Faecalibacterium Prausnitzii, did the authors check their abundance in the intestine after exercise training?

      (3) How to define exercise intensity? Was VO2 Max testing used in this study?

      (4) As the strict control, it is recommended to set 4 groups of exercise training groups: daily vigorous exercise training, daily moderate exercise training, daily vigorous exercise training with intermittent rest days, and daily moderate exercise training with intermittent rest days.

      (5) Are there any differences in diet and metabolism between different groups of mice, which may affect the phenotypes, especially the composition and the the diverstiy of gut microbiota?

    4. Reviewer #2 (Public review):

      Lian et al. provide novel and exciting findings related to exercise-induced intestinal injury that have many implications for those engaging in any kind of training protocol. The authors continue to provide data demonstrating that different forms of exercise training impart a unique signature to the gut microbiota. The paper is well-written, easy to follow, and contains ample information in all sections. The figures are displayed in a clear and comprehensible format, with elegant images. I do have a few concerns regarding some aspects of the paper listed below, but otherwise, I feel that the authors clearly state their objectives, implement valid methods, and summarize their findings with the appropriate conclusions given their experimental constraints.

      (1) The authors performed extensive experiments demonstrating the immediate effects of a bout of exercise on intestinal integrity throughout a 6-week training program. Additionally, the authors go as far as to show that successive exercise sessions appear to augment the observed damage. This is very important and noteworthy data. But I wonder, had the endpoint collections been taken 24 hours+ after the last exercise bout, would the findings be different? My concern is that the 1-hour time point is biased towards seeing more damage. I understand the acute effects of exercise occur and are important to report, but they can be transient, and adaptations ensue. My main concern is that the data shows the onset of the initial damage, but nothing addresses an adaptive or recovery response that could counter the observed exercise-induced intestinal injury. Even metrics such as stool consistency/ pellets per hour/ abnormal defecation measurements could indicate the function of the GI system after exercise and may offer more information related to damage vs recovery.

      (2) An additional concern arises with the model of forced treadmill running. It was previously shown that forced treadmill running resulted in more gut damage compared to voluntary wheel running, with or without dextran sodium sulfate-induced colitis (PMID: 23707215). This type of training appears to be very important in initiating damage to the GI. Understanding how much of this is related to the chosen exercise protocol, forced treadmill running, will be very important for future experiments. Exercise intensity has been suggested to be a major factor in exercise-induced intestinal damage. Therefore, the group designated as MOD-EX in this paper may be over the intensity threshold that limits GI damage. The protocols used in this manuscript may be inherently biased towards enhancing exercise-induced GI damage, which is not necessarily negative, especially when a damaging protocol is needed. However, how much this relates to and can be translated to humans is not clear and needs further experimentation.

      (3) I think the comparison between groups at the specified time point is important, but I believe additional comparisons should be included that show within-group differences across each time point. For example, in the Mod group, does FITC- dextran change between 4 and 6 weeks? Are there morphological change differences between 2, 4, and 6 weeks within each group? Essentially addressing a progression in damage as a function of the duration of exercise training. The authors clearly show exercise-induced damage to the GI, but we do not know how this damage is handled or if the continuation of exercise continues to reinforce the disruption in the epithelial cells.

      (4) The authors describe the purpose of this study as being to identify key regulators of the destruction and reconstruction process of the GI after exercise (introduction lines 128-129). While the authors did sufficient work to describe certain contributing factors, I do not believe they have provided compelling data on the key regulators of exercise-induced intestinal injury, at least experimentally they did not perform exhaustive experiments to identify such. Nor did the authors include data showing any kind of reconstruction that occurs in the GI after exercise. I believe the authors need to revise this statement to reflect that they investigated certain or specific regulators of the damage response in the intestines after exercise training.

      (5) Was water intake monitored and recorded per group? If so I think it would be important to include in the supplemental data. Fluid intake/proper hydration can also contribute to changes in the microbiome and if the data is available, it would complement the food intake. If for any reason the exercise groups were taking in less fluid it may be a confounding factor that should be considered.

      (6) Methods section - Treadmill running exercise protocol, line 143, I think there is a typo with "exercise straining". Did the authors mean to write "exercise training"? If it is indeed a typo, the same appears in the supplemental material under the same section.

      (7) The microbiome analysis is sufficient, and the authors speculate on the possible consequences of the observed changes to the microbiota. However, I believe Figures 5E-G are misleading. The positive correlation is present because of the increase in gut leakiness and the observed exercise-induced increase in microbes. However the same correlation could be made with any positive adaptation to exercise and the observed gut leakiness. I believe those correlations, as described now, postulate these microbes (members of the family Lachnospiraceae) are associated with increased gut leakiness. However, this correlation is not compelling as it is, and additional experiments are warranted to justify this. It cannot be ruled out that the microbes are increasing due to exercise itself. Additionally, reports have suggested species within the Lachnospiraceae family do increase in response to exercise in mice and are associated with positive adaptations to exercise (PMID: 28862530, PMID: 37940330, PMID: 36517598). With this, it should be noted that Lachnospiraceae was also found to be negatively associated with endurance performance (PMID: 35002754). Therefore, specific species or stains of Lachnospiraceae may be highly responsive to exercise while others are not. Without deeper sequencing it is impossible to tease this out and therefore, the authors should be careful with any interpretation beyond discussing what is observed. Additionally, these correlations between Lachnospiraceae and gut leakiness should be interpreted cautiously or more experiments should be included which demonstrate these microbes are connected to gut leakiness. Much more research is needed to determine exactly what strains are positively and negatively associated with exercise adaptations and performance.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigated how partial loss of SynGap1 affects inhibitory neurons derived from the MGE in the auditory cortex, focusing on their synaptic inputs and excitability. While haplo-insufficiently of SynGap1 is known to lead to intellectual disabilities, the underlying mechanisms remain unclear.

      Strengths:

      The questions are novel

      Weaknesses:

      Despite the interesting and novel questions, there are significant issues regarding the experimental design and potential misinterpretations of key findings. Consequently, the manuscript contributes little to our understanding of SynGap1 loss mechanisms.

      Major issues in the second version of the manuscript:

      In the review of the first version there were major issues and contradictions with the sEPSC and mEPSC data, and were not resolved after the revision, and the new control experiments rather confirmed the contradiction.

      In the original review I stated: "One major concern is the inconsistency and confusion in the intermediate conclusions drawn from the results. For instance, while the sEPSC data indicates decreased amplitude in PV+ and SOM+ cells in cHet animals, the frequency of events remains unchanged. In contrast, the mEPSC data shows no change in amplitudes in PV+ cells, but a significant decrease in event frequency. The authors conclude that the former observation implies decreased excitability. However, traditionally, such observations on mEPSC parameters are considered indicative of presynaptic mechanisms rather than changes of network activity.‎ The subsequent synapse counting experiments align more closely with the traditional conclusions. This issue can be resolved by rephrasing the text. However, it would remain unexplained why the sEPSC frequency shows no significant difference. If the majority of sEPSC events were indeed mediated by spiking (which is blocked by TTX), the average amplitudes and frequency of mEPSCs should be substantially lower than those of sEPSCs. Yet, they fall within a very similar range, suggesting that most sEPSCs may actually be independent of action potentials. But if that was indeed the case, the changes of purported sEPSC and mEPSC results should have been similar."<br /> Contradictions remained after the revision of the manuscript. On one hand, the authors claimed in the revised version that "We found no difference in mEPSC amplitude between the two genotypes (Fig. 1g), indicating that the observed difference in sEPSC amplitude (Figure 1b) could arise from decreased network excitability". On the other hand, later they show "no significative difference in either amplitude or inter-event intervals between sEPSC and mEPSC, suggesting that in acute slices from adult A1, most sEPSCs may actually be AP independent." The latter means that sEPSCs and mEPSCs are the same type of events, which should have the same sensitivity to manipulations.

      We understand that the data are confusing. Our results suggest a diverse population of PV+ cells, with varying reliance on action potential-dependent and -independent release. Several PV+ cells indeed show TTX sensitivity (reduced EPSC event amplitudes following TTX application: See Fig.1c-f, at the end of this document), but their individual responses are diluted when all cells are pooled together. To account for this variability, we are currently recording sEPSC followed by mEPSC from more mice of both genotypes. We will rephrase the text to reflect the updated data accordingly, keeping with the editors and reviewers’ suggestions.

      Concerns about the quality of the synapse counting experiments were addressed by showing additional images in a different and explaining quantification. However, the admitted restriction of the analysis of excitatory synapses to the somatic region represent a limitation, as they include only a small fraction of the total excitation - even if, the slightly larger amplitudes of their EPSPs are considered.

      We agree with the reviewer that restricting the anatomical analysis of excitatory synapses to PV cell somatic region is a limitation, which is what we have already highlighted in the discussion of the revised manuscript. Recent studies, based on serial block-face scanning electron microscopy, suggest that cortical PV+ interneurons receive more robust excitatory inputs to their perisomatic region as compared to pyramidal neurons (see for example, Hwang et al. 2021, Cerebral Cortex, http://doi.org/10.1093/cercor/bhaa378). It is thus possible that putative glutamatergic synapses, analysed by vGlut1/PSD95 colocalisation around PV+ cell somata, may be representative of a substantially major excitatory input population. Similar immunolabeling and quantification approach coupled with mEPSC analysis have been reported in several publications by other labs (for example Bernard et al 2022, Science 378, doi: 10.1126/science.abm7466; Exposito-Alonso et al, 2020 eLife, doi: 10.7554/eLife.57000). Since analysing putative excitatory synapses onto PV+ dendrites would be difficult and require a much longer time, we will re-phrase the text to more clearly highlight the rationale and limitation of this approach.

      New experiments using paired-pulse stimulation provided an answer to issues 3 and 4. Note that the numbering of the Figures in the responses and manuscript are not consistent.

      We are glad that the reviewer found that the new paired-pulse experiments answered previously raised concerns. We will correct the discrepancy in figure numbers in the manuscript.

      I agree that low sampling rate of the APs does not change the observed large differences in AP threshold, however, the phase plots are still inconsistent in a sense that there appears to be an offset, as all values are shifted to more depolarized membrane potentials, including threshold, AP peak, AHP peak. This consistent shift may be due to a non-biological differences in the two sets of recordings, and, importantly, it may negate the interpretation of the I/f curves results (Fig. 5e).

      We agree with the reviewers that higher sampling rate would allow to more accurately assess different parameters, such as AP height, half-width, rise time, etc., while it would not affect the large differences in AP threshold we observed between control and mutant mice. Since the phase plots to not add to our result analysis, we will remove them. The offset shown in Fig.5 was due to the unfortunate choice of two random neurons; this offset is not present in the different examples shown in Fig.7. We apologize for the confusion.

      Additional issues:

      The first paragraph of the Results mentioned that the recorded cells were identified by immunolabelling and axonal localization. However, neither the Results nor the Methods mention the criteria and levels of measurements of axonal arborization.

      As suggested, we will add this information in the revised manuscript.

      The other issues of the first review were adequately addressed by the Authors and the manuscript improved by these changes.

      Reviewer #3 (Public review):

      This paper compares the synaptic and membrane properties of two main subtypes of interneurons (PV+, SST+) in the auditory cortex of control mice vs mutants with Syngap1 haploinsufficiency. The authors find differences between control and mutants in both interneuron populations, although they claim a predominance in PV+ cells. These results suggest that altered PV-interneuron functions in the auditory cortex may contribute to the network dysfunctions observed in Syngap1 haploinsufficiency-related intellectual disability.

      The subject of the work is interesting, and most of the approach is rather direct and straightforward, which are strengths. There are also some methodological weaknesses and interpretative issues that reduce the impact of the paper.

      (1) Supplementary Figure 3: recording and data analysis. The data of Supplementary Figure 3 show no differences either in the frequency or amplitude of synaptic events recorded from the same cell in control (sEPSCs) vs TTX (mEPSCs). This suggests that, under the experimental conditions of the paper, sEPSCs are AP-independent quantal events. However, I am concerned by the high variability of the individual results included in the Figure. Indeed, several datapoints show dramatically different frequencies in control vs TTX, which may be explained by unstable recording conditions. It would be important to present these data as time course plots, so that stability can be evaluated. Also, the claim of lack of effect of TTX should be corroborated by positive control experiments verifying that TTX is working (block of action potentials, for example). Lastly, it is not clear whether the application of TTX was consistent in time and duration in all the experiments and the paper does not clarify what time window was used for quantification.

      We understand the reviewer’s concern about high variability. To account for this variability, we are currently recording sEPSC followed by mEPSC from more mice of both genotypes.

      Indeed, we confirmed that TTX was working several times through the time course of this study, in different aliquots prepared from the same TTX vial used for all experiments. The results of the last test we performed, showing that TTX application blocks action potentials (2 recordings, one from a SST+ and one from a PV+ interneuron), are shown in Fig.1a,b at the end of this document. TTX was applied using the same protocol for all recorded neurons. In particular, sEPSCs were first sampled over a 2 min period. TTX (1μM; Alomone Labs) was then perfused into the recording chamber at a flow rate of 2 mL/min. We then waited for 5 min before sampling mEPSCs over a 2 min period. We will add this information in the revised manuscript methods. Finally, Fig.1g-j shows series resistance (Rs) over time for 4 different PV+ interneurons, indicating recording stability. These results are representative of the entire population of recorded neurons, which we have meticulously analysed one by one.

      (2) Figure 1 and Supplementary Figure 3: apparent inconsistency. If, as the authors claim, TTX does not affect sEPSCs (either in the control or mutant genotype, Supplementary Figure 3 and point 1 above), then comparing sEPSC and mEPSC in control vs mutants should yield identical results. In contrast, Figure 1 reports a _selective_ reduction of sEPSCs amplitude (not in mEPSCs) in mutants, which is difficult to understand. The proposed explanation relying on different pools of synaptic vesicles mediating sEPSCs and mEPSCs does not clarify things. If this was the case, wouldn't it also imply a decrease of event frequency following TTX addition? However, this is not observed in Supplementary Figure 3. My understanding is that, according to this explanation, recordings in control solution would reflect the impact of two separate pools of vesicles, whereas, in the presence of TTX, only one pool would be available for release. Therefore, TTX should cause a decrease in the frequency of the recorded events, which is not what is observed in Supplementary Figure 3.

      Our results suggest a diverse population of PV+ cells, with varying reliance on action potential-dependent and -independent release. Several PV+ cells indeed show TTX sensitivity (reduced EPSC event amplitudes following TTX application: See Fig.1c-f, at the end of this document), but their individual responses are diluted when all cells are pooled together. As mentioned above, we are currently recording sEPSCs followed by mEPSCs from more mice of both genotypes, to account for the large variability. We will rephrase the text in the revised manuscript according to the updated data and reviewers’ suggestions.

      (3) Figure 1: statistical analysis. Although I do appreciate the efforts of the authors to illustrate both cumulative distributions and plunger plots with individual data, I am confused by how the cumulative distributions of Figure 1b (sEPSC amplitude) may support statistically significant differences between genotypes, but this is not the case for the cumulative distributions of Figure 1g (inter mEPSC interval), where the curves appear even more separated. A difference in mEPSC frequency would also be consistent with the data of Supplementary Fig 2b, which otherwise are difficult to reconciliate. I would encourage the authors to use the Kolmogorov-Smirnov rather than a t-test for the comparison of cumulative distributions.

      We thank the reviewer for this suggestion. We used both cumulative distribution and plunger plots with individual data because they convey 2 different kinds of information. Cumulative distributions highlight where the differences lie (the deltas between the groups), while plunger plots with individual data show the variability between data points. In histogram 1g, the variability is greater than in 1b (due to the smaller sample size in 1g), which leads to larger error bars and directly impacts the statistical outcome. So, while the delta is larger in 1g, the variability is also greater. In contrast, the delta in 1b is smaller, as is the variability, which in turn affects the statistical outcome. To address this issue, we are currently increasing N of recordings.

      We will include Kolmogorov-Smirnov analysis in the revision, as suggested; nevertheless, we will base our conclusions on statistical results generated by the linear mixed model (LMM), modelling animal as a random effect and genotype as the fixed effect. We used this statistical analysis since we considered the number of mice as independent replicates and the number of cells in each mouse as repeated/correlated measures. The reason we decided to use LMM for our statistical analyses is based on the growing concern over reproducibility in biomedical research and the ongoing discussion on how data are analysed (see for example, Yu et al (2022), Neuron 110:21-35 https://doi: 10.1016/j.neuron.2021.10.030; Aarts et al. (2014). Nat Neurosci 17, 491–496. https://doi.org/10.1038/nn.3648). We acknowledge that patch-clamp data has been historically analysed using t-test and analysis of variance (ANOVA), or equivalent non-parametric tests. However, these tests assume that individual observations (recorded neurons in this case) are independent of each other. Whether neurons from the same mouse are independent or correlated variables is an unresolved question, but does not appear to be likely from a biological point of view. Statisticians have developed effective methods to analyze correlated data, including LMM. In parallel, we also tested the data by using the standard parametric and non-parametric analyses and reported these results as well (Tables 1-9, and S1-S2).

      (4) Methods. I still maintain that a threshold at around -20/-15 mV for the first action potential of a train seems too depolarized (see some datapoints of Fig 5c and Fig7c) for a healthy spike. This suggest that some cells were either in precarious conditions or that the capacitance of the electrode was not compensated properly.

      As suggested by the reviewer, we will exclude the neurons with threshold at -20/-15 mV. In addition, we performed statistical analysis with and without these cells (data reported below) and found that whether these cells are included or excluded, the statistical significance of the results does not change.

      Fig.5c: including the 2 outliers from cHet group with values of -16.5 and 20.6 mV: -42.6±1.01 mV in control, n=33 cells from 15 mice vs -35.3±1.2 mV in cHet, n=40 cells from 17 mice, ***p<0.001, LMM; excluding the 2 outliers from cHet group -42.6±1.01 mV in control, n=33 cells from 15 mice vs -36.2±1.1 mV in cHet, n=38 cells from 17 mice, ***p<0.001, LMM.

      Fig.7c: including the 2 outliers from cHet group with values of -16.5 and 20.6 mV: -43.4±1.6 mV in control, n=12 cells from 9 mice vs -33.9±1.8 mV in cHet, n=24 cells from 13 mice, **p=0.002, LMM; excluding the 2 outliers from cHet group -43.4±1.6 mV in control, n=12 cells from 9 mice vs -35.4±1.7 mV in cHet, n=22 cells from 13 mice, *p=0.037, LMM.

      (5) The authors claim that "cHet SST+ cells showed no significant changes in active and passive membrane properties (Figure 8d,e); however, their evoked firing properties were affected with fewer AP generated in response to the same depolarizing current injection".<br /> This sentence is intrinsically contradictory. Action potentials triggered by current injections are dependent on the integration of passive and active properties. If the curves of Figure 8f are different between genotypes, then some passive and/or active property MUST have changed. It is an unescapable conclusion. The general _blanket_ statement of the authors that there are no significant changes in active and passive properties is in direct contradiction with the current/#AP plot.

      We shall rephrase the text according to the reviewer’s suggestion to better represent the data. As discussed in the first revision, it's possible that other intrinsic factors, not assessed in this study, may have contributed to the effect shown in the current/#AP plot.

      (6) The phase plots of Figs 5c, 7c, and 7h suggest that the frequency of acquisition/filtering of current-clamp signals was not appropriate for fast waveforms such as spikes. The first two papers indicated by the authors in their rebuttal (Golomb et al., 2007; Stevens et al., 2021) did not perform a phase plot analysis (like those included in the manuscript). The last work quoted in the rebuttal (Zhang et al., 2023) did perform phase plot analysis, but data were digitized at a frequency of 20KHz (not 10KHz as incorrectly indicated by the authors) and filtered at 10 kHz (not 2-3 kHz as by the authors in the manuscript). To me, this remains a concern.

      We agree with the reviewer that higher sampling rate would allow to more accurately assess different AP parameters, such as AP height, half-width, rise time, etc. The papers were cited in context of determining AP threshold, not performing phase plot analysis. We apologize for the confusion and error. Further, as mentioned above, we will remove the phase plots since they do not add relevant information.

      (7) The general logical flow of the manuscript could be improved. For example, Fig 4 seems to indicate no morphological differences in the dendritic trees of control vs mutant PV cells, but this conclusion is then rejected by Fig 6. Maybe Fig 4 is not necessary. Regarding Fig 6, did the authors check the integrity of the entire dendritic structure of the cells analyzed (i.e. no dendrites were cut in the slice)? This is critical as the dendritic geometry may affect the firing properties of neurons (Mainen and Sejnowski, Nature, 1996).

      As suggested by the reviewer, we will remove Fig.4. All the reconstructions used for dendritic analysis contained intact cells with no evidently cut dendrites.

      Author response image 1.

      (a, b) Representative voltage responses of a SST+ cell (a) and a PV+ cell (b) in absence (left) and presence (right) of TTX in response to depolarizing current injections corresponding to threshold current and 2x threshold current. (c-f) Cumulative histograms of sEPSCs/mEPSCs amplitude (bin width 0.5 pA) and frequency (bin width 10 ms) recorded from four PV+ cells.  sEPSC were recorded for 2 minutes, then TTX (1μM; Alomone Labs) was perfused into the recording chamber. After 5 minutes, mEPSC were recorded for 2 minutes. (g, h, i, j) Time course plots of series resistance (Rs) of the four representative PV+ cells shown in c-f before (sEPSC) and during the application of TTX (mEPSC).


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study is designed to assess the role of Syngap1 in regulating the physiology of the MGE-derived PV+ and SST+ interneurons. Syngap1 is associated with some mental health disorders, and PV+ and SST+ cells are the focus of many previous and likely future reports from studies of interneuron biology, highlighting the translational and basic neuroscience relevance of the authors' work.

      Strengths of the study are using well-established electrophysiology methods and the highly controlled conditions of ex vivo brain slice experiments combined with a novel intersectional mouse line, to assess the role of Syngap1 in regulating PV+ and SST+ cell properties. The findings revealed that in the mature auditory cortex, Syngap1 haploinsufficiency decreases both the intrinsic excitability and the excitatory synaptic drive onto PV+ neurons from Layer 4. In contrast, SST+ interneurons were mostly unaffected by Syngap1 haploinsufficiency. Pharmacologically manipulating the activity of voltagegated potassium channels of the Kv1 family suggested that these channels contributed to the decreased PV+ neuron excitability by Syngap insufficiency. These results therefore suggest that normal Syngap1 expression levels are necessary to produce normal PV+ cell intrinsic properties and excitatory synaptic drive, albeit, perhaps surprisingly, inhibitory synaptic        transmission was not affected by Syngap1 haploinsufficiency.

      Since the electrophysiology experiments were performed in the adult auditory cortex, while Syngap1 expression was potentially affected since embryonic stages in the MGE, future studies should address two important points that were not tackled in the present study. First, what is the developmental time window in which Syngap1 insufficiency disrupted PV+ neuron properties? Albeit the embryonic Syngap1 deletion most likely affected PV+ neuron maturation, the properties of Syngap-insufficient PV+ neurons do not resemble those of immature PV+ neurons. Second, whereas the observation that Syngap1 haploinsufficiency affected PV+ neurons in auditory cortex layer 4 suggests auditory processing alterations, MGE-derived PV+ neurons populate every cortical area. Therefore, without information on whether Syngap1 expression levels are cortical area-specific, the data in this study would predict that by regulating PV+ neuron electrophysiology, Syngap1 normally controls circuit function in a wide range of cortical areas, and therefore a range of sensory, motor and cognitive functions. These are relatively minor weaknesses regarding interpretation of the data in the present study that the authors could discuss.

      We agree with the reviewer on the proposed open questions, which we now discuss in the revised manuscript. We do have experimental evidence suggesting that Syngap1 mRNA is expressed by PV+ and SST+ neurons in different cortical areas, during early postnatal development and in adulthood (Jadhav et al., 2024); therefore, we agree that it will be important, in future experiments, to tackle the question of when the observed phenotypes arise.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors investigated how partial loss of SynGap1 affects inhibitory neurons derived from the MGE in the auditory cortex, focusing on their synaptic inputs and excitability. While haplo-insufficiently of SynGap1 is known to lead to intellectual disabilities, the underlying mechanisms remain unclear.

      Strengths:

      The questions are novel

      Weaknesses:

      Despite the interesting and novel questions, there are significant concerns regarding the experimental design and data quality, as well as potential misinterpretations of key findings. Consequently, the current manuscript fails to contribute substantially to our understanding of SynGap1 loss mechanisms and may even provoke unnecessary controversies.

      Major issues:

      (1) One major concern is the inconsistency and confusion in the intermediate conclusions drawn from the results. For instance, while the sEPSC data indicates decreased amplitude in PV+ and SOM+ cells in cHet animals, the frequency of events remains unchanged. In contrast, the mEPSC data shows no change in amplitudes in PV+ cells, but a significant decrease in event frequency. The authors conclude that the former observation implies decreased excitability. However, traditionally, such observations on mEPSC parameters are considered indicative of presynaptic mechanisms rather than changes of network activity. The subsequent synapse counting experiments align more closely with the traditional conclusions. This issue can be resolved by rephrasing the text. However, it would remain unexplained why the sEPSC frequency shows no significant difference. If the majority of sEPSC events were indeed mediated by spiking (which is blocked by TTX), the average amplitudes and frequency of mEPSCs should be substantially lower than those of sEPSCs. Yet, they fall within a very similar range, suggesting that most sEPSCs may actually be independent of action potentials. But if that was indeed the case, the changes of purported sEPSC and mEPSC results should have been similar.

      We understand the reviewer’s perspective; indeed, we asked ourselves the very same question regarding why the sEPSC and mEPSC frequency fall within a similar range when we analysed neuron means (bar graphs). We thus recorded sEPSCs followed by mEPSCs from several PV neurons (control and cHet) and included this data to the revised version of the manuscript (new Supplementary Figure 3). We found that the average amplitudes and frequency of mEPSCs together with their respective cumulative probability curves were not significantly different than those of sEPSCs. We rephrased the manuscript to present potential interpretations of the data.

      We hope that we have correctly interpreted the reviewer's concern. If the question is why we do not observe a significant difference in the average frequency when comparing sEPSC and mEPSC in control mice, this could be explained by the fact that increased mean amplitude of sEPSCs was primarily driven by alterations in large sEPSCs (>9-10pA, as shown in cumulative probability in Fig. 1b right), with smaller ones being relatively unaffected. Consequently, a reduction in sEPSC amplitude may not necessarily result in a significant decrease in frequency since their values likely remain above the detection threshold of 3 pA. 

      If the question is whether we should see the same parameters affected by the genetic manipulation in both sEPSC and mEPSC, then another critical consideration is the involvement of the releasable pool in mEPSCs versus sEPSCs. Current knowledge suggests that activity-dependent and -independent release may not necessarily engage the same pool of vesicles or target the same postsynaptic sites. This concept has been extensively explored (Sara et al., 2005; Sara et al., 2011; reviewed in Ramirez and Kavalali, 2011; Kavalali, 2015). Consequently, while we may have traditionally interpreted activitydependent and -independent data assuming they utilize the same pool, this is no longer accurate. The current discussion in the field revolves around understanding the mechanisms underlying such phenomena. Therefore, comparisons between sEPSCs and mEPSCs may not yield conclusive data but rather speculative interpretations. 

      (2) Another significant concern is the quality of synapse counting experiments. The authors attempted to colocalize pre- and postsynaptic markers Vglut1 and PSD95 with PV labelling. However, several issues arise. Firstly, the PV labelling seems confined to soma regions, with no visible dendrites. Given that the perisomatic region only receives a minor fraction of excitatory synapses, this labeling might not accurately represent the input coverage of PV cells. Secondly, the resolution of the images is insufficient to support clear colocalization of the synaptic markers. Thirdly, the staining patterns are peculiar, with PSD95 puncta appearing within regions clearly identified as somas by Vglut1, hinting at possible intracellular signals. Furthermore, PSD95 seems to delineate potential apical dendrites of pyramidal cells passing through the region, yet Vglut1+ partners are absent in these segments, which are expected to be the marker of these synapses here. Additionally, the cumulative density of Vglut2 and Vglut1 puncta exceeds expectations, and it's surprising that subcortical fibers labeled by Vglut2 are comparable in number to intracortical Vglut1+ axon terminals. Ideally, N(Vglut1)+N(Vglut2) should be equal or less than N(PSD95), but this is not the case here. Consequently, these results cannot be considered reliable due to these issues.

      We apologize, as it appears that the images we provided in the first submission have caused confusion. The selected images represent a single focal plane of a confocal stack, which was visually centered on the PV cell somata. We chose just one confocal plane because we thought it showed more clearly the apposition of presynaptic and postsynaptic immunolabeling around the somata. In the revised version of the manuscript, we now provide higher magnification images, which will clearly show how we identified and selected the region of interest for the quantification of colocalized synaptic markers (Supplemental Figure 2). In our confocal stacks, we can also identify PV immunolabeled dendrites and colocalized vGlut1/PSD95 or vGlut2/PSD95 puncta on them; but these do not appear in the selected images because, as explained, only one focal plane, centered on the PV cell somata, was shown. 

      We acknowledge the reviewer's point that in PV+ cells the majority of excitatory inputs are formed onto dendrites; however, we focused on the somatic excitatory inputs to PV cells, because despite their lower number, they produce much stronger depolarization in PV neurons than dendritic excitatory inputs (Hu et al., 2010; Norenberg et al., 2010). Further, quantification of perisomatic putative excitatory synapses is more reliable since by using PV immunostaining, we can visualize the soma and larger primary dendrites, but smaller, higher order dendrites are not be always detectable. Of note, PV positive somata receive more excitatory synapses than SST positive and pyramidal neuron somata as found by electron microscopy studies in the visual cortex (Hwang et al., 2021; Elabbady et al., 2024).

      Regarding the comment on the density of vGlut1 and vGlut2 puncta, the reason that the numbers appear high and similar between the two markers is because we present normalized data (cHet normalized to their control values for each set of immunolabelling) to clearly represent the differences between genotypes. We now provide a more detailed explanation of our methods in the revised manuscript.  Briefly, immunostained sections were imaged using a Leica SP8-STED confocal microscope, with an oil immersion 63x (NA 1.4) at 1024 X 1024, z-step =0.3 μm, stack size of ~15 μm. Images were acquired from the auditory cortex from at least 3 coronal sections per animal. All the confocal parameters were maintained constant throughout the acquisition of an experiment. All images shown in the figures are from a single confocal plane. To quantify the number of vGlut1/PSD95 or vGlut2/PSD95 putative synapses, images were exported as TIFF files and analyzed using Fiji (Image J) software. We first manually outlined the profile of each PV cell soma (identified by PV immunolabeling). At least 4 innervated somata were selected in each confocal stack. We then used a series of custom-made macros in Fiji as previously described (Chehrazi et al, 2023). After subtracting background (rolling value = 10) and Gaussian blur (σ value = 2) filters, the stacks were binarized and vGlut1/PSD95 or vGlut2/PSD95 puncta were independently identified around the perimeter of a targeted soma in the focal plane with the highest soma circumference. Puncta were quantified after filtering particles for size (included between 0-2μm2) and circularity (included between 01). Data quantification was done by investigators blind to the genotype, and presented as normalized data over control values for each experiment.

      (3) One observation from the minimal stimulation experiment was concluded by an unsupported statement. Namely, the change in the onset delay cannot be attributed to a deficit in the recruitment of PV+ cells, but it may suggest a change in the excitability of TC axons.

      We agree with the reviewer, please see answer to point below.

      (4) The conclusions drawn from the stimulation experiments are also disconnected from the actual data. To make conclusions about TC release, the authors should have tested release probability using established methods, such as paired-pulse changes. Instead, the only observation here is a change in the AMPA components, which remained unexplained.

      As suggested, we performed additional paired-pulse ratio experiments at different intervals. We found that, in contrast with Control mice, evoked excitatory inputs to layer IV PV+ cells showed paired-pulse facilitation in cHet mice (Figure 3g, h), suggesting that thalamocortical presynaptic sites likely have decreased release probability in mutant compared to control mice.  We rephrased the text according to the data obtained from this new experiment.

      (5) The sampling rate of CC recordings is insufficient to resolve the temporal properties of the APs. Therefore, the phase-plots cannot be interpreted (e.g. axonal and somatic AP components are not clearly separated), raising questions about how AP threshold and peak were measured. The low sampling rate also masks the real derivative of the AP signals, making them apparently faster.

      We acknowledge that a higher sampling rate would provide a more detailed and smoother phase-plot. However, in the context of action potential parameters analysis here, it is acceptable to use sampling rates ranging from 10 kHz to 20 kHz (Golomb et al., 2007; Stevens et al., 2021; Zhang et al., 2023), which are considered adequate in the context of the present study. Indeed, our study aims to evaluate "relative" differences in the electrophysiological phenotype when comparing groups following a specific genetic manipulation. A sampling rate of 10 kHz is commonly employed in similar studies, including those conducted by our collaborator and co-author S. Kourrich (e.g., Kourrich and Thomas 2009, Kourrich et al., 2013), as well as others (Russo et al., 2013; Ünal et al., 2020; Chamberland et al., 2023). Despite being acquired at a lower sampling rate than potentially preferred by the reviewer, our data clearly demonstrate significant differences between the experimental groups, especially for parameters that are negligibly or not affected by the sampling rate used here (e.g., #spikes/input, RMP, Rin, Cm, Tm, AP amplitude, AP latency, AP rheobase).

      Regarding the phase-plots, a higher sampling rate would indeed have resulted in smoother curves. However, the differences were sufficiently pronounced to discern the relative variations in action potential waveforms between the experimental groups.

      A related issue is that the Methods section lacks essential details about the recording conditions, such as bridge balance and capacitance neutralization.

      We indeed performed bridge balance and neutralized the capacitance before starting every recording. We added the information in the methods.

      (6) Interpretation issue: One of the most fundamental measures of cellular excitability, the rheobase, was differentially affected by cHet in BCshort and BCbroad. Yet, the authors concluded that the cHet-induced changes in the two subpopulations are common.

      We are uncertain if we have correctly interpreted the reviewer's comment. While we observed distinct impacts on the rheobase (Fig. 7d and 7i), there seems to be a common effect on the AP threshold (Fig. 7c and 7h), as interpreted and indicated in the final sentence of the results section for Figure 7. If our response does not address the reviewer's comment adequately, we would greatly appreciate it if the reviewer could rephrase their feedback.

      (7) Design issue:

      The Kv1 blockade experiments are disconnected from the main manuscript. There is no experiment that shows the causal relationship between changes in DTX and cHet cells. It is only an interesting observation on AP halfwidth and threshold. However, how they affect rheobase, EPSCs, and other topics of the manuscript are not addressed in DTX experiments.

      Furthermore, Kv1 currents were never measured in this work, nor was the channel density tested. Thus, the DTX effects are not necessarily related to changes in PV cells, which can potentially generate controversies.

      While we acknowledge the reviewer's point that Kv1 currents and density weren't specifically tested, an important insight provided by Fig. 5 is the prolonged action potential latency. This delay is significantly influenced by slowly inactivating subthreshold potassium currents, namely the D-type K+ current. It's worth noting that D-type current is primarily mediated by members of the Kv1 family. The literature supports a role for Kv1.1containing channels in modulating responses to near-threshold stimuli in PV cells (Wang et al., 1994; Goldberg et al., 2008; Zurita et al., 2018). However, we recognize that besides the Kv1 family, other families may also contribute to the observed changes.

      To address this concern, we revised the manuscript by referring to the more accurate term "D-type K+ current", and rephrased the discussion to clarify the limit of our approach. It is not our intention to open unnecessary controversy, but present the data we obtained. We believe this approach and rephrasing the discussion as proposed will prevent unnecessary controversy and instead foster fruitful discussions.

      (8) Writing issues:

      Abstract:

      The auditory system is not mentioned in the abstract.

      One statement in the abstract is unclear. What is meant by "targeting Kv1 family of voltagegated potassium channels was sufficient..."? "Targeting" could refer to altered subcellular targeting of the channels, simple overexpression/deletion in the target cell population, or targeted mutation of the channel, etc. Only the final part of the Results revealed that none of the above, but these channels were blocked selectively.

      We agree with the reviewer and we will rephrase the abstract accordingly.

      Introduction:

      There is a contradiction in the introduction. The second paragraph describes in detail the distinct contribution of PV and SST neurons to auditory processing. But at the end, the authors state that "relatively few reports on PV+ and SST+ cell-intrinsic and synaptic properties in adult auditory cortex". Please be more specific about the unknown properties.

      We agree with the reviewer and we will rephrase more specifically.

      (9) The introduction emphasizes the heterogeneity of PV neurons, which certainly influences the interpretation of the results of the current manuscript. However, the initial experiments did not consider this and handled all PV cell data as a pooled population.

      In the initial experiments, we handled all PV cell data together because we wanted to be rigorous and not make assumptions on the different PV cells, which in later experiments we distinguished based on the intrinsic properties alone. Nevertheless, based on this and other reviewers’ comments, we completely rewrote the introduction in the revised manuscript to increase both focus and clarity.

      (10) The interpretation of the results strongly depends on unpublished work, which potentially provide the physiological and behavioral contexts about the role of GABAergic neurons in SynGap-haploinsufficiency. The authors cite their own unpublished work, without explaining the specific findings and relation to this manuscript.

      We agree with the reviewer and provided more information and updated references in the revised version of this manuscript. Our work is now in press in Journal of Neuroscience.

      (11) The introduction of Scholl analysis experiments mentions SOM staining, however, there is no such data about this cell type in the manuscript.

      We thank the reviewer for noticing the error; we changed SOM with SST (SOM and SST are two commonly used acronyms for Somatostatin expressing interneurons).

      Reviewer #3 (Public Review):

      This paper compares the synaptic and membrane properties of two main subtypes of interneurons (PV+, SST+) in the auditory cortex of control mice vs mutants with Syngap1 haploinsufficiency. The authors find differences at both levels, although predominantly in PV+ cells. These results suggest that altered PV-interneuron functions in the auditory cortex may contribute to the network dysfunction observed in Syngap1 haploinsufficiencyrelated intellectual disability. The subject of the work is interesting, and most of the approach is direct and quantitative, which are major strengths. There are also some weaknesses that reduce its impact for a broader field.

      (1) The choice of mice with conditional (rather than global) haploinsufficiency makes the link between the findings and Syngap1 relatively easy to interpret, which is a strength. However, it also remains unclear whether an entire network with the same mutation at a global level (affecting also excitatory neurons) would react similarly.

      We agree with the reviewer and now discuss this important caveat in the revised manuscript.

      (2) There are some (apparent?) inconsistencies between the text and the figures. Although the authors appear to have used a sophisticated statistical analysis, some datasets in the illustrations do not seem to match the statistical results. For example, neither Fig 1g nor Fig 3f (eNMDA) reach significance despite large differences. 

      We respectfully disagree, we do not think the text and figures are inconsistent. In the cited example, large apparent difference in mean values does not show significance due to the large variability in the data; further, we did not exclude any data points, because we wanted to be rigorous. In particular, for Fig.1g, statistical analysis shows a significant increase in the inter-mEPSC interval (*p=0.027, LMM) when all events are considered (cumulative probability plots), while there is no significant difference in the inter-mEPSCs interval for inter-cell mean comparison (inset, p=0.354, LMM).  Inter-cell mean comparison does not show difference with Mann-Whitney test either (p=0.101, the data are not normally distributed, hence the choice of the Mann-Whitney test). For Fig. 3f (eNMDA), the higher mean value for the cHet versus the control is driven by two data points which are particularly high, while the other data points overlap with the control values. The MannWhitney test show also no statistical difference (p=0.174).

      In the manuscript, discussion of the data is based on the results of the LMM analysis, which takes in account both the number of cells and the numbers of mice from which these cells are recorded. We chose this statistical approach because it does not rely on the assumption that cells recorded from same mouse are independent variables. In the supplemental tables, we provided the results of the statistical analysis done with both LMM and the most commonly used Mann Whitney (for not normally distributed) or t-test (for normally distributed), for each data set.

      Also, the legend to Fig 9 indicates the presence of "a significant decrease in AP half-width from cHet in absence or presence of a-DTX", but the bar graph does not seem to show that.

      We apologize for our lack of clarity. In legend 9, we reported the statistical comparisons between 1) vehicle-treated cHET vs control PV+ cells and 2) a-DTX-treated cHET vs control PV+ cells. We rephrased the legend of the figure to avoid confusion.

      (3) The authors mention that the lack of differences in synaptic current kinetics is evidence against a change in subunit composition. However, in some Figures, for example, 3a, the kinetics of the recorded currents appear dramatically different. It would be important to know and compare the values of the series resistance between control and mutant animals.

      We agree with the reviewer that there appears to be a qualitative difference in eNMDA decay between conditions, although quantified eNMDA decay itself is similar between groups. We have used a cutoff of 15 % for the series resistance (Rs), which is significantly more stringent as compared to the cutoff typically used in electrophysiology, which are for the vast majority between 20 and 30%. To answer this concern, we re-examined the Rs, we compared Rs between groups and found no difference for Rs in eAMPA (Control mice: 13.2±0.5, n=16 cells from 7 mice vs cHet mice: 13.7±0.3, n=14 cells from 7 mice; LMM, p=0.432) and eNMDA (Control mice: 12.7±0.7, n=6 cells from 3 mice vs cHet mice: 13.8±0.7 in cHet n=6 cells from 5 mice: LMM, p=0.231). Thus, the apparent qualitative difference in eNMDA decay stems from inter-cell variability rather than inter-group differences. Notably, this discrepancy between the trace (Fig. 3a) and the data (Fig. 3f, right) is largely due to inter-cell variability, particularly in eNMDA, where a higher but non-significant decay rate is driven by a couple of very high values (Fig. 3f, right). In the revised manuscript, we now show traces that better represent our findings.

      (4) A significant unexplained variability is present in several datasets. For example, the AP threshold for PV+ includes points between -50-40 mV, but also values at around -20/-15 mV, which seems too depolarized to generate healthy APs (Fig 5c, Fig7c).

      We acknowledge the variability in AP threshold data, with some APs appearing too depolarized to generate healthy spikes. However, we meticulously examined each AP that spiked at these depolarized thresholds and found that other intrinsic properties (such as Rin, Vrest, AP overshoot, etc.) all indicate that these cells are healthy. Therefore, to maintain objectivity and provide unbiased data to the community, we opted to include them in our analysis. It's worth noting that similar variability has been observed in other studies (Bengtsson Gonzales et al., 2020; Bertero et al., 2020).

      Further, we conducted a significance test on AP threshold excluding these potentially unhealthy cells and found that the significant differences persist. After removing two outliers from the cHet group with values of -16.5 and 20.6 mV, we obtain: -42.6±1.01 mV in control, n=33, 15 mice vs -36.2±1.1 mV in cHet, n=38 cells, 17 mice (LMM, ***p<0.001). Thus, whether these cells are included or excluded, our interpretations and conclusions remain unchanged.

      We would like to clarify that these data have not been corrected with the junction potential, as described in the revised version.

      (5) I am unclear as to how the authors quantified colocalization between VGluts and PSD95 at the low magnification shown in Supplementary Figure 2.

      We apologize for our lack of clarity. Although the analysis was done at high resolution, the figures were focused on showing multiple PV somata receiving excitatory inputs. We added higher magnification figures and more detailed information in the methods of the revised version. Please also see our response to reviewer #2.

      (6) The authors claim that "cHet SST+ cells showed no significant changes in active and passive membrane properties", but this claim would seem to be directly refused by the data of Fig 8f. In the absence of changes in either active or passive membrane properties shouldn't the current/#AP plot remain unchanged?

      While we acknowledge the theoretical expectation that changes in intrinsic parameters should correlate with alterations in neuronal firing, the absence of differences in the parameters analyzed in this study is not incompatible with the clear and significant decrease in firing rate observed in cHet SST+ cells. It's indeed possible that other intrinsic factors, not assessed in this study, may have contributed to this effect. However, exploring these mechanisms is beyond the scope of our current investigation. We rephrased the discussion and added this limitation of our study in the revised version.

      (7) The plots used for the determination of AP threshold (Figs 5c, 7c, and 7h) suggest that the frequency of acquisition of current-clamp signals may not have been sufficient, this value is not included in the Methods section.

      This study utilized a sampling rate of 10 kHz, which is a standard rate for action potential analysis in the present context. While we acknowledge that a higher sampling rate could have enhanced the clarity of the phase plot, our recording conditions, as detailed in our response to Rev#2/comment#5, were suitable for the objectives of this study.

      Reference list

      Bengtsson Gonzales C, Hunt S, Munoz-Manchado AB, McBain CJ, Hjerling-Leffler J (2020) Intrinsic electrophysiological properties predict variability in morphology and connectivity among striatal Parvalbumin-expressing Pthlh-cells Scientific Reports 10: 15680 https://doi.org/10.1038/s41598-020-72588-1

      Bertero A, Zurita H, Normandin M, Apicella AJ (2020) Auditory long-range parvalbumin cortico-striatal neurons. Frontiers in Neural Circuits 14:45 http://doi.org/10.3389/fncir.2020.00045

      Chamberland S, Nebet ER, Valero M, Hanani M, Egger R, Larsen SB, Eyring KW, Buzsáki G, Tsien RW (2023) Brief synaptic inhibition persistently interrupts firing of fastspiking interneurons Neuron 111:1264–1281 http://doi.org/10.1016/j.neuron.2023.01.017 

      Chehrazi P, Lee KKY, Lavertu-Jolin M, Abbasnejad Z, Carreño-Muñoz MI, Chattopadhyaya B, Di Cristo G (2023). The p75 neurotrophin receptor in preadolescent prefrontal parvalbumin interneurons promotes cognitive flexibility in adult mice Biological Psychiatry 94:310-321 doi: https://doi.org/10.1016/j.biopsych.2023.04.019

      Elabbady L, Seshamani S, Mu S, Mahalingam G, Schneider-Mizell C, Bodor AL, Bae JA, Brittain D, Buchanan J, Bumbarger DJ, Castro MA, Dorkenwald S, Halageri A, Jia Z, Jordan C, Kapner D, Kemnitz N, Kinn S, Lee K, Li K, Lu R, Macrina T, Mitchell E, Mondal SS,  Popovych S, Silversmith W, Takeno M, Torres R,  Turner NL, Wong W,  Wu J, Yin W, Yu SC, The MICrONS Consortium,  Seung S,  Reid C,  Da Costa NM,  Collman F (2024) Perisomatic features enable efficient and dataset wide cell-type classifications across large-scale electron microscopy volumes bioRxiv, https://doi.org/10.1101/2022.07.20.499976

      Goldberg EM, Clark BD, Zagha E, Nahmani M, Erisir A, Rudy B (2008) K+ Channels at the axon initial segment dampen near-threshold excitability of neocortical fastspiking GABAergic interneurons. Neuron 58 :387–400 https://doi.org/10.1016/j.neuron.2008.03.003

      Golomb D, Donner K, Shacham L, Shlosberg D, Amitai Y, Hansel D. (2007). Mechanisms of firing patterns in fast-spiking cortical interneurons PLoS Computational Biology 38:e156 http://doi.org/10.1371/journal.pcbi.0030156

      Hu H, Martina M, Jonas P (2010). Dendritic mechanisms underlying rapid synaptic activation of fast-spiking hippocampal interneurons. Science 327:52–58. http://doi.org/10.1126/science.1177876

      Hwang YS, Maclachlan C, Blanc J, Dubois A, Petersen CH, Knott G, Lee SH (2021). 3D ultrastructure of synaptic inputs to distinct gabaergic neurons in the mouse primary visual cortex. Cerebral Cortex 31:2610–2624 http://doi.org/10.1093/cercor/bhaa378

      Jadhav V, Carreno-Munoz MI, Chehrazi P, Michaud JL, Chattopadhyaya B, Di Cristo G (2024) Developmental Syngap1 haploinsufficiency in medial ganglionic eminencederived interneurons impairs auditory cortex activity, social behavior and extinction of fear memory The Journal of Neuroscience in press.

      Kavalali E (2015) The mechanisms and functions of spontaneous neurotransmitter release Nature Reviews Neuroscience 16:5–16. https://doi.org/10.1038/nrn3875

      Kourrich S, Thomas MJ (2009) Similar neurons, opposite adaptations: psychostimulant experience differentially alters firing properties in accumbens core versus shell Journal of Neuroscience 29:12275-12283 http://doi.org:10.1523/JNEUROSCI.302809.2009

      Kourrich S, Hayashi T, Chuang JY, Tsai SY, Su TP, Bonci A (2013) Dynamic interaction between sigma-1 receptor and Kv1.2 shapes neuronal and behavioral responses to cocaine Cell 152:236–247. http://doi.org/10.1016/j.cell.2012.12.004 

      Norenberg A, Hu H, Vida I, Bartos M, Jonas P (2010) Distinct nonuniform cable properties optimize rapid and efficient activation of fast-spiking GABAergic interneurons Proceedings of the National Academy of Sciences 107:894–9. http://doi.org/10.1073/pnas.0910716107

      Ramirez DM, Kavalali ET (2011) Differential regulation of spontaneous and evoked neurotransmitter release at central synapses Current Opinion in Neurobiology 21:275282 https://doi.org/10.1016/j.conb.2011.01.007

      Russo G, Nieus TR, Maggi S, Taverna S (2013) Dynamics of action potential firing in electrically connected striatal fast-spiking interneurons Frontiers in Cellular Neuroscience 7:209 https://doi.org/10.3389/fncel.2013.00209

      Sara Y, Virmani T, Deák F, Liu X, Kavalali ET (2005) An isolated pool of vesicles recycles at rest and drives spontaneous neurotransmission Neuron 45:563-573 https://doi.org/10.1016/j.neuron.2004.12.056

      Sara Y, Bal M, Adachi M, Monteggia LM, Kavalali ET (2011) Use-dependent AMPA receptor block reveals segregation of spontaneous and evoked glutamatergic neurotransmission Journal of Neuroscience 14:5378-5382 https://doi.org/10.1523/JNEUROSCI.5234-10.2011

      Stevens SR, Longley CM, Ogawa Y, Teliska LH, Arumanayagam AS, Nair S, Oses-Prieto JA, Burlingame AL, Cykowski MD, Xue M, Rasband MN (2021) Ankyrin-R regulates fast-spiking interneuron excitability through perineuronal nets and Kv3.1b K+ channels eLife 10:e66491 http://doi.org/10.7554/eLife.66491  

      Ünal CT, Ünal B, Bolton MM (2020) Low-threshold spiking interneurons perform feedback inhibition in the lateral amygdala Brain Structure and Function 225:909–923. http://doi.org/10.1007/s00429-020-02051-4

      Wang H, Kunkel DD, Schwartzkroin PA, Tempel BL (1994) Localization of Kv1.1 and Kv1.2, two K channel proteins, to synaptic terminals, somata, and dendrites in the mouse brain. The Journal of Neuroscience 14:4588-4599. https://doi.org/10.1523/JNEUROSCI.14-08-04588.1994

      Zhang YZ, Sapantzi S, Lin A, Doelfel SR, Connors BW, Theyel BB (2023) Activitydependent ectopic action potentials in regular-spiking neurons of the neocortex. Frontiers in Cellular Neuroscience 17 https://doi.org/10.3389/fncel.2023.1267687

      Zurita H, Feyen PLC, Apicella AJ (2018) Layer 5 callosal parvalbumin-expressing neurons: a distinct functional group of GABAergic neurons. Frontiers in Cellular Neuroscience 12:53 https://doi.org/10.3389/fncel.2018.00053

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points:

      (1) The introduction nicely summarizes multiple aspects of cortical auditory physiology and auditory stimulus processing, but the experiments in this study are performed ex vivo in acute slices. I wonder if it would be beneficial to shorten the initial parts of the introduction and consider a more focused approach highlighting, for example, to what extent Syngap1 expression levels change during development and/or vary across cortical areas. What cortical cell types express Syngap1 in addition to PV+ and SST+ cells? If multiple cell types normally express Syngap1, the introduction could clarify that the present study investigated Syngap1 insufficiency by isolating its effects in PV+ and SST+ neurons, a condition that may not reflect the situation in mental health disorders, but that would allow to better understand the global effects of Syngap1 deficiency.

      We thank the reviewer for this very helpful suggestion. We have changed the introduction as suggested.

      (2) Because mEPSCs are not affected in Syngap+/- interneurons, the authors conclude that the lower sEPSC amplitude is due to decreased network activity. However, it is likely that the absence of significant difference (Fig 1g), is due to lack of statistical power (control: 18 cells from 7 mice, cHet: 8 cells from 4 mice). By contrast, the number of experiments recording sIPSCs and mIPSCs (Fig 2) is much larger. Hence, it seems that adding mEPSC data would allow the authors to more to convincingly support their conclusions. To more directly test whether Syngap insufficiency affects excitatory inputs by reducing network activity, ideally the authors would want to record sEPSCs followed by mEPSCs from each PV+ neuron (control or cHet). Spontaneous event frequency and amplitude should be higher for sEPSCs than mEPSCs, and Syngap1 deficiency should affect only sEPSCs, since network activity is abolished following tetrodotoxin application for mEPSC recordings.

      We agreed with the reviewer’s suggestion, and recorded sEPSCs followed by mEPSCs from PV+ neurons in control and cHet mice (Figure supplement 3). In both genotypes, we found no significative difference in either amplitude or inter-event intervals between sEPSC and mEPSC, suggesting that in acute slices from adult A1, most sEPSCs may actually be action potentialindependent. While perhaps surprisingly at first glance, this result can be explained by recent published work suggesting that action potentials-dependent (sEPSC) and -independent (mEPSC) release may not necessarily engage the same pool of vesicles or target the same postsynaptic sites (Sara et al., 2005; Sara et al., 2011; reviewed in Ramirez and Kavalali, 2011; Kavalali, 2015). Consequently, while we may have traditionally interpreted activity-dependent and -independent data assuming they utilize the same pool, this is no longer accurate; and indeed, the current discussion in the field revolves around understanding the mechanisms underlying such phenomena.

      Therefore, comparisons between sEPSCs and mEPSCs may not yield conclusive data but rather speculative interpretations. We have added this caveat in the result section.

      (3) The interpretation of the data of experiments studying thalamic inputs and single synapses should be clarified and/or rewritten. First, it is not clear why the authors assume they are selectively activating thalamic fibers with electrical stimulation. Presumably the authors applied electrical stimulation to the white matter, but the methods not clearly explained? Furthermore, the authors could clarify how stimulation of a single axon was verified and how could they distinguish release failures from stimulation failures, since the latter are inherent to using minimal stimulation conditions. Interpretations of changes in potency, quantal content, failure rate, etc, depend on the ability to distinguish release failures from stimulation failures. In addition, can the authors provide information on how many synapses a thalamic axon does establish with each postsynaptic PV+ cell from control or Syngap-deficient mice? Even if stimulating a single thalamic axon would be possible, if the connections from single thalamic axons onto single PV+ or SST+ cells are multisynaptic, this would make the interpretation of minimal stimulation experiments in terms of single synapses very difficult or unfeasible. In the end, changes in EPSCs evoked by electrical stimulation may support the idea that Syngap1 insufficiency decreases action potential evoked release, that in part mediates sEPSC, but without indicating the anatomical identity of the stimulated inputs (thalamic, other subcortical or cortico-cortical?

      We agree with the reviewer, our protocol does not allow the stimulation of single synapses/axons, but rather bulk stimulation of multiple axons. We thank the reviewer for bringing up this important point.  In our experiment, we reduced the stimulus intensity until no EPSC was observed, then increased it until we reached the minimum intensity at which we could observe an EPSC. We now explain this approach more clearly in the method and changed the results section by removing any reference to “minimal” stimulation.

      Electrical stimulation of thalamic radiation could indeed activate not only monosynaptic thalamic fibers but also polysynaptic (corticothalamic and/or corticocortical) EPSC component. To identify monosynaptic thalamocortical connections, we used as criteria the onset latencies of EPSC and the variability jitter obtained from the standard deviation of onset latencies, as previously published by other studies (Richardson et al., 2009; Blundon et al., 2011; Chun et al., 2013). Onset latencies were defined as the time interval between the beginning of the stimulation artifact and the onset of the EPSC. Monosynaptic connections are characterized by short onset latencies and low jitter variability (Richardson et al., 2009; Blundon et al., 2011; Chun et al., 2013). In our experiments, the initial slopes of EPSCs evoked by white matter stimulation had short onset latencies (mean onset latency, 4.27 ± 0.11 ms, N=16 neurons in controls, and 5.07 ± 0.07 ms, N=14 neurons in cHet mice) and low onset latency variability jitter (0.24 ± 0.03 ms in controls vs 0.31 ± 0.03 ms in cHet mice), suggestive of activation of monosynaptic thalamocortical monosynaptic connections (Richardson et al., 2009; Blundon et al., 2011; Chun et al., 2013). Of note, a previous study in adult mice (Krause et al., 2014) showed that local field potentials evoked by electrical stimulation of medial geniculate nucleus or thalamic radiation were comparable. The information is included in the revised manuscript, in the methods section.

      (4) The data presentation in Fig 6 is a bit confusing and could be clarified. First, in cluster analysis (Fig 6a), the authors may want to clarify why a correlation between Fmax and half width is indicative of the presence of subgroups. Second, performing cluster analysis based on two variables alone (Fmax and half-width) might not be very informative, but perhaps the authors could better explain why they chose two variables and particularly these two variables? For reference, see the study by Helm et al. 2013 (cited by the authors) using multivariate cluster analysis. Additionally, the authors may want to clarify, for non-expert readers, whether or not finding correlations between variables (heatmap in the left panel of Fig 6b) is a necessary condition to perform PCA (Fig 6b right panel).

      We apologize for the confusion and thank the reviewer for the comment. The choice of Fmax and half width to cluster PV+ subtypes was based on past observation of atypical PV+ cells characterized by a slower AP half-width and lower maximal AP firing frequency (Nassar et al., 2015; Bengtsson Gonzales et al., 2018; Ekins et al., 2020; Helm et al., 2013). Based on these previous studies we performed hierarchical clustering of AP half-width and Fmax-initial values based on Euclidean distance. However, in our case some control PV+ cells showed no correlation between these parameters (as it appears in Fig 6a left, right, and 6b left), requiring the use of additional 11 parameters to perform Principal Component Analysis (PCA). PCA takes a large data set with many variables per observation and reduces them to a smaller set of summary indices (Murtagh and Heck 1987).  We choose in total 13 parameters that are largely unrelated, while excluding others that are highly correlated and represent similar features of membrane properties (e.g., AP rise time and AP half-width). PCA applies a multiexponential fit to the data, and each new uncorrelated variable [principal component (PC)] can describe more than one original parameter (Helm et al., 2013). We added information in the methods section as suggested.

      Minor points:

      (1) In Fig 3a, the traces illustrating the effects of syngap haplo-insufficiency on AMPA and NMDA EPSCs do not seem to be the best examples? For instance, the EPSCs in syngap-deficient neurons show quite different kinetics compared with control EPSCs, however Fig 3f suggests similar kinetics.

      We changed the traces as suggested.

      (2) In the first paragraph of results, it would be helpful to clarify that the experiments are performed in acute brain slices and state the age of animals.

      Done as suggested.

      (3) The following two sentences are partly redundant and could be synthesized or merged to shorten the text: "Recorded MGE-derived interneurons, identified by GFP expression, were filled with biocytin, followed by posthoc immunolabeling with anti-PV and anti-SST antibodies. PV+ and SST+ interneuron identity was confirmed using neurochemical marker (PV or SST) expression and anatomical properties (axonal arborisation location, presence of dendritic spines)."

      We rewrote the paragraph to avoid redundancy, as suggested.

      (4) In the following sentence, the mention of dendritic spines is not sufficiently clear, does it mean that spine density or spine morphology differ between PV and SST neurons?: "PV+ and SST+ interneuron identity was confirmed using neurochemical marker (PV or SST) expression and anatomical properties (axonal arborisation location, presence of dendritic spines)."

      We meant absence or presence of spines. PV+ cells typically do not have spines, while SST+ interneurons do. We corrected the sentence to improve clarity.

      (5) The first sentence of the discussion might be a bit of an overinterpretation of the data? Dissecting the circuit mechanisms of abnormal auditory function with Syngap insufficiency requires experiments very different from those reported in this paper. Moreover, that PV+ neurons from auditory cortex are particularly vulnerable to Syngap deficiency is possible, but this question is not addressed directly in this study because the effects on auditory cortex PV+ neurons were not thoroughly compared with those on PV+ cells from other cortical areas.

      We agreed with the reviewer and changed this sentence accordingly.

      Reviewer #2 (Recommendations For The Authors):

      Minor issues:

      "glutamatergic synaptic inputs to Nkx2.1+ interneurons from adult layer IV (LIV) auditory cortex" it would be more correct if this sentence used "in adult layer IV" instead of "from".

      We made the suggested changes.

      It would be useful information to provide whether the slice quality and cellular health was affected in the cHet animals.

      We did not observe any difference between control and cHet mice in terms of slices quality, success rate of recordings and cellular health. We added this sentence in the methods.

      Were BCshort and BCbroad observed within the same slice, same animals? This information is important to exclude the possibility of experimental origin of the distint AP width.

      We have indeed found both type of BCs in the same animal, and often in the same slice.

      Reviewer #3 (Recommendations For The Authors):

      (1) The introduction is rather diffuse but should be more focused on Syngap1, cellular mechanisms and interneurons. For example, the authors do not even define what Syngap1 is.

      We thank the reviewer for this very helpful suggestion. We have changed the introduction as suggested.

      (2) Some of the figures appear very busy with small fonts that are difficult to read. Also, it is very hard to appreciate the individual datapoints in the blue bars. Could a lighter color please be used?

      We thank the reviewer for this helpful suggestion. We made the suggested changes.

      (3)     The strength/limit of using a conditional knockout should be discussed.

      Done as suggested, in the revised Discussion.

      (4) Statistical Methods should be described more in depth and probably some references should be added. Also, do (apparent?) inconsistencies between the text and the figures depend on the analysis used? For example, neither Fig 1g nor Fig 3f (eNMDA) reach significance despite large differences in the illustration. Maybe the authors could acknowledge this trend and discuss potential reasons for not reaching significance. Also, the legend to Fig 9 indicates the presence of "a significant decrease in AP half-width from cHet in absence or presence of a-DTX", but the bar graph does not show that.

      The interpretation of the data is based on the results of the LMM analysis, which takes in account both the number of cells and the numbers of mice from which these cells are recorded. We chose this statistical approach because it does not rely on the assumption that cells recorded from same mouse are independent variables. We further provided detailed information about statistical analysis done in the tables associated to each figure where we show both LMM and the most commonly used Mann Whitney (for not normally distributed) or t-test (for normally distributed), for each data set.  As suggested, we added reference about LMM in Methods section.

      (5) Were overall control and mutant mice of the same average postnatal age? Is there a reason for the use of very young animals? Was any measured parameter correlated with age?

      Control and mutant mice were of the same postnatal age. In particular, the age range was 75.5 ± 1.8 postnatal days for control group and 72.1 ± 1.7 postnatal days in cHet group (mean ± S.E.M.). We did not use any young mice. We have added this information in the methods.

      (6) Figure 6. First, was the dendritic arborization of all cells fully intact? Second, if Figure 7 uses the same data of Figure 5 after a reclassification of PV+ cells into the two defined subpopulations, then Figure 5 should probably be eliminated as redundant. Also, if the observed changes impact predominantly one PV+ subpopulation, maybe one could argue that the synaptic changes could be (at least partially) explained by the more limited dendritic surface of BC-short (higher proportion in mutant animals) rather than only cellular mechanisms.

      All the reconstructions used for dendritic analysis contained intact cells with no evidently cut dendrites. We added this information in the methods section.

      Regarding Figure 5 we recognize the reviewer’s point of view; however, we think both figures are informative. In particular, Figure 5 shows the full data set, avoiding assumptions on the different PV cells subtype classification, and can be more readily compared with several previously published studies.

      We apologize for our lack of clarity, which may have led to a misunderstanding. In Figure 6i our data show that BC-short from cHet mice have a larger dendritic surface and a higher number of branching points compared to BC-short from control mice. 

      (7) I am rather surprised by the AP threshold of ~-20/-15 mV observed in the datapoints of some figures. Did the authors use capacitance neutralization for their current-clamp recordings? What was the sampling rate used? Some of the phase plots (Vm vs dV/dT) suggests that it may have been too low.

      See responses to public review.

      (8) Please add the values of the series resistance of the recordings and a comparison between control and mutant animals.

      As suggested, we re-examined the series resistance values (Rs), comparing Rs between groups and found no difference for Rs in eAMPA (Control mice: 13.2±0.5,  n=16 cells from 7 mice; cHet mice: 13.7±0.3, n=14 cells from 7 mice; LMM, p=0.432) and eNMDA (Control mice: 12.7±0.7, n=6 cells from 3 mice; cHet mice: 13.8±0.7, n=6 cells from 5 mice;  LMM, p=0.231).

      (9) I am unclear as to how the authors quantified colocalization between VGluts and PSD95 at the low magnification shown in Supplementary Figure 2. Could they please show images at higher magnification?

      Quantification was done on high resolution images. Immunostained sections were imaged using a Leica SP8-STED confocal microscope, with an oil immersion 63x (NA 1.4) at 1024 X 1024, zoom=1, z-step =0.3 μm, stack size of ~15 μm. As suggested by the reviewer, we changed the figure by including images at higher magnification.

      (10) The authors claim that "cHet SST+ cells showed no significant changes in active and passive membrane properties", but this claim would seem to be directly refused by the data of Fig 8f. In the absence of changes in either active or passive membrane properties shouldn't the current/#AP plot remain unchanged?

      The reduction in intrinsic excitability observed in SST+ cells from cHet mice could be due to intrinsic factors not assessed in this study. However, exploring these mechanisms is beyond the scope of our current investigation. We rephrased the discussion and added this limitation of our study in the revised version.

      (11) Please check references as some are missing from the list.

      Thank you for noticing this issue, which is now corrected.

      References  

      Bengtsson Gonzales C, Hunt S, Munoz-Manchado AB, McBain CJ, Hjerling-Leffler J (2020) Intrinsic electrophysiological properties predict variability in morphology and connectivity among striatal Parvalbumin-expressing Pthlh-cells Scientific Reports 10:15680 https://doi.org/10.1038/s41598-020-72588-1

      Blundon JA, Bayazitov IT, Zakharenko SS (2011) Presynaptic gating of postsynaptically expressed plasticity at mature thalamocortical synapses The Journal of Neuroscience 31:1601225 https://doi.org/10.1523/JNEUROSCI.3281-11.2011

      Chun S, Bayazitov IT, Blundon JA, Zakharenko SS (2013) Thalamocortical long-term potentiation becomes gated after the early critical period in the auditory cortex The journal of Neuroscience 33:7345-57 https://doi.org/10.1523/JNEUROSCI.4500-12.2013.

      Ekins TG, Mahadevan V, Zhang Y, D’Amour JA, Akgül G, Petros TJ, McBain CJ (2020) Emergence of non-canonical parvalbumin-containing interneurons in hippocampus of a murine model of type I lissencephaly eLife 9:e62373 https://doi.org/10.7554/eLife.62373

      Helm J, Akgul G, Wollmuth LP (2013) Subgroups of parvalbumin-expressing interneurons in layers 2/3 of the visual cortex Journal of Neurophysiology 109:1600–1613 https://doi.org/10.1152/jn.00782.2012

      Kavalali E (2015) The mechanisms and functions of spontaneous neurotransmitter release Nature Reviews Neuroscience 16:5–16 https://doi.org/10.1038/nrn3875

      Krause BM, Raz A, Uhlrich DJ, Smith PH, Banks MI (2014) Spiking in auditory cortex following thalamic stimulation is dominated by cortical network activity Frontiers in Systemic Neuroscience 8:170. https://doi.org/10.3389/fnsys.2014.00170

      Murtagh F, Heck A (1987) Multivariate Data Analysis. Dordrecht, The Netherlands: Kluwer Academic.

      Nassar M, Simonnet J, Lofredi R, Cohen I, Savary E, Yanagawa Y, Miles R, Fricker D (2015) Diversity and overlap of Parvalbumin and Somatostatin expressing interneurons in mouse presubiculum Frontiers in Neural Circuits 9:20. https://doi.org/10.3389/fncir.2015.00020

      Ramirez DM, Kavalali ET (2011) Differential regulation of spontaneous and evoked neurotransmitter release at central synapses Current Opinion in Neurobiology 21:275-282 https://doi.org/10.1016/j.conb.2011.01.007

      Richardson RJ, Blundon JA, Bayazitov IT, Zakharenko SS (2009) Connectivity patterns revealed by mapping of active inputs on dendrites of thalamorecipient neurons in the auditory cortex. The Journal of Neuroscience 29:6406-17 https://doi.org/10.1523/JNEUROSCI.3028-09.2009

      Sara Y, Virmani T, Deák F, Liu X, Kavalali ET (2005) An isolated pool of vesicles recycles at rest and drives spontaneous neurotransmission Neuron 45:563-573 https://doi.org/10.1016/j.neuron.2004.12.056

      Sara Y, Bal M, Adachi M, Monteggia LM, Kavalali ET (2011) Use-dependent AMPA receptor block reveals segregation of spontaneous and evoked glutamatergic neurotransmission Journal of Neuroscience 14:5378-5382 https://doi.org/10.1523/JNEUROSCI.5234-10.2011

    2. eLife Assessment

      This study provides valuable evidence indicating that SynGap1 regulates the synaptic drive and membrane excitability of parvalbumin- and somatostatin-positive interneurons in the auditory cortex. Since haplo-insufficiency of SynGap1 has been linked to intellectual disabilities without a well-defined underlying cause, the central question of this study is timely. While in their revision the authors successfully addressed questions related to changes in thalamocortical presynaptic excitability, the support for the authors' conclusions is incomplete as concerns around the interpretability of the spontaneous/mini EPSCs, interpretation of results related to excitability, restriction of anatomical analysis of excitatory synapses to the somatic region, and technical problems regarding phase plots remain unresolved.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigated how partial loss of SynGap1 affects inhibitory neurons derived from the MGE in the auditory cortex, focusing on their synaptic inputs and excitability. While haplo-insufficiently of SynGap1 is known to lead to intellectual disabilities, the underlying mechanisms remain unclear.

      Strengths:

      The questions are novel

      Weaknesses:

      Despite the interesting and novel questions, there are significant issues regarding the experimental design and potential misinterpretations of key findings. Consequently, the manuscript contributes little to our understanding of SynGap1 loss mechanisms.

      Major issues in the second version of the manuscript:<br /> In the review of the first version there were major issues and contradictions with the sEPSC and mEPSC data, and were not resolved after the revision, and the new control experiments rather confirmed the contradiction.<br /> In the original review I stated: "One major concern is the inconsistency and confusion in the intermediate conclusions drawn from the results. For instance, while the sEPSC data indicates decreased amplitude in PV+ and SOM+ cells in cHet animals, the frequency of events remains unchanged. In contrast, the mEPSC data shows no change in amplitudes in PV+ cells, but a significant decrease in event frequency. The authors conclude that the former observation implies decreased excitability. However, traditionally, such observations on mEPSC parameters are considered indicative of presynaptic mechanisms rather than changes of network activity.‎ The subsequent synapse counting experiments align more closely with the traditional conclusions. This issue can be resolved by rephrasing the text. However, it would remain unexplained why the sEPSC frequency shows no significant difference. If the majority of sEPSC events were indeed mediated by spiking (which is blocked by TTX), the average amplitudes and frequency of mEPSCs should be substantially lower than those of sEPSCs. Yet, they fall within a very similar range, suggesting that most sEPSCs may actually be independent of action potentials. But if that was indeed the case, the changes of purported sEPSC and mEPSC results should have been similar."<br /> Contradictions remained after the revision of the manuscript. On one hand, the authors claimed in the revised version that "We found no difference in mEPSC amplitude between the two genotypes (Fig. 1g), indicating that the observed difference in sEPSC amplitude (Figure 1b) could arise from decreased network excitability". On the other hand, later they show "no significative difference in either amplitude or inter-event intervals between sEPSC and mEPSC, suggesting that in acute slices from adult A1, most sEPSCs may actually be AP independent." The latter means that sEPSCs and mEPSCs are the same type of events, which should have the same sensitivity to manipulations.

      Concerns about the quality of the synapse counting experiments were addressed by showing additional images in a different and explaining quantification. However, the admitted restriction of the analysis of excitatory synapses to the somatic region represent a limitation, as they include only a small fraction of the total excitation - even if, the slightly larger amplitudes of their EPSPs are considered.

      New experiments using pari-pulse stimulation provided an answer to issues 3 and 4. Note that the numbering of the Figures in the responses and manuscript are not consistent.

      I agree that low sampling rate of the APs does not change the observed large differences in AP threshold, however, the phase plots are still inconsistent in a sense that there appears to be an offset, as all values are shifted to more depolarized membrane potentials, including threshold, AP peak, AHP peak. This consistent shift may be due to a non-biological differences in the two sets of recordings, and, importantly, it may negate the interpretation of the I/f curves results (Fig. 5e).

      Additional issues:<br /> The first paragraph of the Results mentioned that the recorded cells were identified by immunolabelling and axonal localization. However, neither the Results nor the Methods mention the criteria and levels of measurements of axonal arborization.

      The other issues of the first review were adequately addressed by the Authors and the manuscript improved by these changes.

    4. Reviewer #3 (Public review):

      This paper compares the synaptic and membrane properties of two main subtypes of interneurons (PV+, SST+) in the auditory cortex of control mice vs mutants with Syngap1 haploinsufficiency. The authors find differences between control and mutants in both interneuron populations, although they claim a predominance in PV+ cells. These results suggest that altered PV-interneuron functions in the auditory cortex may contribute to the network dysfunctions observed in Syngap1 haploinsufficiency-related intellectual disability.

      The subject of the work is interesting, and most of the approach is rather direct and straightforward, which are strengths. There are also some methodological weaknesses and interpretative issues that reduce the impact of the paper.

      (1) Supplementary Figure 3: recording and data analysis. The data of Supplementary Figure 3 show no differences either in the frequency or amplitude of synaptic events recorded from the same cell in control (sEPSCs) vs TTX (mEPSCs). This suggests that, under the experimental conditions of the paper, sEPSCs are AP-independent quantal events.<br /> However, I am concerned by the high variability of the individual results included in the Figure. Indeed, several datapoints show dramatically different frequencies in control vs TTX, which may be explained by unstable recording conditions. It would be important to present these data as time course plots, so that stability can be evaluated. Also, the claim of lack of effect of TTX should be corroborated by positive control experiments verifying that TTX is working (block of action potentials, for example). Lastly, it is not clear whether the application of TTX was consistent in time and duration in all the experiments and the paper does not clarify what time window was used for quantification.

      (2) Figure 1 and Supplementary Figure 3: apparent inconsistency. If, as the authors claim, TTX does not affect sEPSCs (either in the control or mutant genotype, Supplementary Figure 3 and point 1 above), then comparing sEPSC and mEPSC in control vs mutants should yield identical results. In contrast, Figure 1 reports a _selective_ reduction of sEPSCs amplitude (not in mEPSCs) in mutants, which is difficult to understand. The proposed explanation relying on different pools of synaptic vesicles mediating sEPSCs and mEPSCs does not clarify things. If this was the case, wouldn't it also imply a decrease of event frequency following TTX addition? However, this is not observed in Supplementary Figure 3. My understanding is that, according to this explanation, recordings in control solution would reflect the impact of two separate pools of vesicles, whereas, in the presence of TTX, only one pool would be available for release. Therefore, TTX should cause a decrease in the frequency of the recorded events, which is not what is observed in Supplementary Figure 3.

      (3) Figure 1: statistical analysis. Although I do appreciate the efforts of the authors to illustrate both cumulative distributions and plunger plots with individual data, I am confused by how the cumulative distributions of Figure 1b (sEPSC amplitude) may support statistically significant differences between genotypes, but this is not the case for the cumulative distributions of Figure 1g (inter mEPSC interval), where the curves appear even more separated. A difference in mEPSC frequency would also be consistent with the data of Supplementary Fig 2b, which otherwise are difficult to reconciliate. I would encourage the authors to use the Kolmogorov-Smirnov rather than a t-test for the comparison of cumulative distributions.

      (4) Methods. I still maintain that a threshold at around -20/-15 mV for the first action potential of a train seems too depolarized (see some datapoints of Fig 5c and Fig7c) for a healthy spike. This suggest that some cells were either in precarious conditions or that the capacitance of the electrode was not compensated properly.

      (5) The authors claim that "cHet SST+ cells showed no significant changes in active and passive membrane properties (Figure 8d,e); however, their evoked firing properties were affected with fewer AP generated in response to the same depolarizing current injection".<br /> This sentence is intrinsically contradictory. Action potentials triggered by current injections are dependent on the integration of passive and active properties. If the curves of Figure 8f are different between genotypes, then some passive and/or active property MUST have changed. It is an unescapable conclusion. The general _blanket_ statement of the authors that there are no significant changes in active and passive properties is in direct contradiction with the current/#AP plot.

      (6) The phase plots of Figs 5c, 7c, and 7h suggest that the frequency of acquisition/filtering of current-clamp signals was not appropriate for fast waveforms such as spikes. The first two papers indicated by the authors in their rebuttal (Golomb et al., 2007; Stevens et al., 2021) did not perform a phase plot analysis (like those included in the manuscript). The last work quoted in the rebuttal (Zhang et al., 2023) did perform phase plot analysis, but data were digitized at a frequency of 20KHz (not 10KHz as incorrectly indicated by the authors) and filtered at 10 kHz (not 2-3 kHz as by the authors in the manuscript). To me, this remains a concern.

      (7) The general logical flow of the manuscript could be improved. For example, Fig 4 seems to indicate no morphological differences in the dendritic trees of control vs mutant PV cells, but this conclusion is then rejected by Fig 6. Maybe Fig 4 is not necessary. Regarding Fig 6, did the authors check the integrity of the entire dendritic structure of the cells analyzed (i.e. no dendrites were cut in the slice)? This is critical as the dendritic geometry may affect the firing properties of neurons (Mainen and Sejnowski, Nature, 1996).

    1. eLife Assessment

      This important study provides a comprehensive assessment of mitochondrial function across age and sex in mice. The strength of evidence supporting this resource is compelling, given the exhaustive number of tissues profiled and in-depth analyses performed.

    2. Reviewer #1 (Public review):

      In this study, Sarver and colleagues carried out an exhaustive analysis of the functioning of various components (Complex I/II/IV) of the mitochondrial electron transport chain (ETC) using a real-time cell metabolic analysis technique (commonly referred as Seahorse oxygen consumption rate (OCR) assay). The authors aimed to generate an atlas of ETC function in about 3 dozen tissue types isolated from all major mammalian organ systems. They used a recently published improvised method by which ETC function can be quantified in freshly frozen tissues. This method enabled them to collect data from almost all organ systems from the same mouse and use many biological replicates (10 mice/experiment) required for an unbiased and statistically robust analysis. Moreover, they studied the influence of sex (male and female) and aging (young adult and old age) on ETC function in these organ systems. The main findings of this study are (1) cells in the heart and kidneys have very active ETC complexes compared to other organ systems, (2) the sex of the mice has little influence on the ETC function, and (3) aging undermined the mitochondrial function in most tissue, but surprisingly in some tissue aging promoted the activity of ETC complexes (e.g., Quadriceps, plantaris muscle, and Diaphragm).

      Comments on the second revision:

      My previous concern remains unaddressed in the new revision. As I mentioned earlier, it is crucial for the authors to include a detailed discussion on the limitations of their method, specifically how maximal respiration does not accurately reflect the actual ATP production rate. Additionally, the authors should highlight the fact that data provided in the manuscript should be interpreted with caution.

    3. Reviewer #2 (Public review):

      Summary:

      The authors utilize a new technique to measure mitochondrial respiration from frozen tissue extracts, which goes around the historical problem of purifying mitochondria prior to analysis, a process that requires a fair amount of time and cannot be easily scaled up.

      Strengths:

      A comprehensive analysis of mitochondrial respiration across tissues, sexes, and two different ages provides foundational knowledge needed in the field.

      Weaknesses:

      While many of the findings are mostly descriptive, this paper provides a large amount of data for the community and can be used as a reference for further studies. As the authors suggest, this is a new atlas of mitochondrial function in mouse. The inclusion of a middle aged time point and a slightly older young point (3-6 months) would be beneficial to the study.

    4. Reviewer #3 (Public review):

      The aim of the study was to map, a) whether different tissues exhibit different metabolic profiles (this is known already), what differences are found between female and male mice and how the profiles changes with age. In particular, the study recorded the activity of respirasomes, i.e. the concerted activity of mitochondrial respiratory complex chains consisting of CI+CIII2+CIV, CII+CIII2+CIV or CIV alone.

      The strength is certainly the atlas of oxidative metabolism in the whole mouse body, the inclusion of the two different sexes and the comparison between young and old mice. The measurement was performed on frozen tissue, which is possible as already shown (Acin-Perez et al, EMBO J, 2020).

      Weakness: The assay reveals the maximum capacity of enzyme activity, which is an artificial situation and may differ from in vivo respiration, as the authors themselves discuss. The material used was a very crude preparation of cells containing mitochondria and other cytosolic compounds and organelles. Thus, the conditions are not well defined and the respiratory chain activity was certainly uncoupled from ATP synthesis. Preparation of more pure mitochondria and testing for coupling would allow evaluation of additional parameters: P/O ratios, feedback mechanism, basal respiration, and ATP-coupled respiration, which reflect in vivo conditions much better. The discussion is rather descriptive and cautious and could lead to some speculations about what could cause the differences in respiration and also what consequences these could have, or what certain changes imply.<br /> Nevertheless, this study is an important step towards this kind of analysis.

      Comments on the second revision:

      I believe this is an important and interesting area of study, although I recognise that the assay which measures maximal enzyme activity under unphysiological conditions has its limitations. Nevertheless, it does seem possible to get a first glance of the respiratory situation in the respective tissue. There is a typo in the source data (Fig. xC) for skeletal muscle.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #2:

      No further questions, but please do add a sentence or two about the lack of these additional points in the discussion as a limitation to the study.

      We have included additional “limitations of the study” in the Discussion Section.

      Reviewer #3:

      The authors have added to the discussion some critical remarks about the limitations of the study, which will help in the assessment of the conclusions.

      In sum, the manuscript has significantly improved during the revision.

      Some minor points should be changed, though

      Page 18 marked: "What causes an age-dependent decrease in mitochondrial OXPHOS genes across tissues, however, is largely unknown." I assume, the authors do not suggest that the abundance of genes is reduced, which means elimination of DNA? Be more precise about this.

      We thank the reviewer for pointing this out. We have clarified this to mean “OXPHOS gene expression” and made a couple changes accordingly.

      Page 18 marked : a paragraph was added addressing the increase in mitochondrial respiration in the heart, this should be discussed in the light of literature as it was done for skeleton muscle the following paragraph

      We have included additional paragraphs in the Discussion Section to talk about increased mitochondrial respiration in the aging heart in the context of published literature.

      Figure 2: it was asked for error bars for the OCR measurements. Response: We have added the error bars and statistical significance to revised Figure 2; however, is it correct that there are no significant differences?

      Figure 2 ranks tissues based on the OCR values within a single group of mice (male or female, young or old) and is not a comparison between male vs female, or young vs old. For this reason, no statistics were included as they are not needed here. The goal of this figure is to highlight the OCR distribution across tissues within a single sex and age group.

    1. eLife Assessment

      This study provides important insights into postnatal lung development and the mechanisms underlying bronchopulmonary dysplasia (BPD), a condition with high morbidity and mortality in newborns. Through the use of neonatal hyperoxia, cell-type-specific inactivation of Tgfbr2, and other injury models, the research focuses on the role of TGF-β signaling in BPD pathogenesis, highlighting impaired myofibroblast proliferation as a key factor. The inactivation of Etc2 in Pdgfra-lineaged cells disrupts myofibroblast cytokinesis, leading to alveolar simplification and reduced cell numbers. The use of transgenic mice and single-cell transcriptomics offers a detailed and high-quality dataset, advancing our understanding of BPD and serving as a invaluable resource for developmental biology and neonatal pulmonary research. The study's comprehensive approach, robust data, and methodological rigor make it a compelling contribution to the field, providing both mechanistic insights and a resource for further research into BPD pathogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors used both the commonly used neonatal hyperoxia model as well as cell-type-specific genetic inactivation of Tgfbr2 models to study the basis of BPD. The bulk of the analyses focus on the mesenchymal cells. Results indicate impaired myofibroblast proliferation, resulting in decreased cell number. Inactivation of Etc2 in Pdgfra-lineaged cells, preventing cytokinesis of myofibroblasts, led to alveolar simplification. Together, the findings demonstrate that disrupted myofibroblast proliferation is a key contributor to BPD pathogenesis.

      Strengths:

      Overall, this comprehensive study of BPD models advances our understanding of the disease. The data are of high quality.

      Comments on latest version:

      In the revision, the authors addressed all critiques.

    3. Reviewer #2 (Public review):

      Summary:

      In this study the authors systematically explore mechanism(s) of impaired postnatal lung development with relevance to BPD (bronchopulmonary dysplasia) in two murine models of 'alveolar simplification', namely hyperoxia and epithelial loss of TGFb signaling. The work presented here is of great importance, given the limited treatment options for a clinical entity frequently encountered in newborns with high morbidity and mortality that is still poorly understood, and the unclear role of TGFb signaling, its signaling levels, and its cellular effects during secondary alveolar septum formation, a lung structure generating event heavily impacted by BPD. The authors show that hyperoxia and epithelial TGFb signaling loss have similar detrimental effects on lung structure and mechanical properties (emphysema-like phenotype) and are associated with significantly decreases numbers of PDGFRa-expressing cells, the major cell pool responsible for generation of postnatal myofibroblasts. They then use a single-cell transcriptomic approach combined with pathway enrichment analysis for both models to elucidate common factors that affect alveologenesis. Using cell communication analysis (NicheNet) between epithelial and myofibroblasts they confirm increased projected TGFb-TGFbR interactions and decreased projected interactions for PDGFA-PDGFRA, and other key pathways, such as SHH and WNT. Based on these results they go on to uncover in a sequela of experiments that surprisingly, increased TGFb appears reactive to postnatal lung injury and rather protective/homeostatic in nature, and the authors establish the requirement for alpha V integrins, but not the subtype alphaVbeta6, a known activator of TGFb signaling and implied in adult lung fibrosis. The authors then go beyond the TGFb axis evaluation to show that mere inhibition of proliferation by conditional KO of Ect2 in Pdgfra lineage results in alveolar simplification, pointing out the pivotal role of PDGFRa-expressing myofibroblasts for normal postnatal lung development.

      Strengths:

      (1) The approach including both pharmacologic and mechanistically-relevant transgenic interventions both of which produced consistent results provides robustness of the results presented here.

      (2) Further adding to this robustness is the use of moderate levels of hyperoxia at 75% FiO2, which is less extreme than 100% FiO2 frequently used by others in the field, and therefore favors the null hypothesis.

      (3) The prudent use of advancement single cell analysis tools, such as NicheNet to establish cell interactions through the pathways they tested and the validation of their scRNA-seq results by analysis of two external datasets. Delineation of the complexity of signals between different cell types during normal and perturbed lung development, such as attempted successfully in this study, will yield further insights into the underlying mechanism(s).

      (4) The combined readout of lung morphometric (MLI) and lung physiologic parameters generates a clinically meaningful readout of lung structure and function.

      (5) The systematic evaluation of TGFb signaling better determines the role in normal and postnatally-injured lung.

      Weaknesses:

      (1) While the study convincingly establishes the effect of lung injury on the proliferation of PDGFRa-expressing cells, differentiation is equally important. Characterization of PDGFRa expressing cells and tracking the changes in the injury models in the scRNA analysis, a key feature of this study, would benefit from expansion in this regard. PDGFRa lineage gives rise to several key fibroblast populations, including myofibroblasts, lipofibroblasts, and matrix-type fibroblasts (Collagen13a1, Collagen14a1). Lipofibroblasts constitute a significant fraction of PDGFRa+ cells, and expand in response to hyperoxic injury, as shown by others. Collagen13a1-expressing fibroblasts expand significantly under both conditions (Fig.3), and appear to contain a significant number of PDGFRa-expressing cells (Suppl Fig.1). Effects of the applied injuries on known differentiation markers for these populations should be documented. Another important aspect would be to evaluate whether the protective/homeostatic effect of TGFb signaling is by supporting differentiation of myofibroblasts. Postnatal Gli1 lineage gains expression of PDGFRa and differentiation markers, such as Acta2 (SMA) and Eln (Tropoelastin). Loss of PDGFRa expression was shown to alter Elastin and TGFb pathway related genes. TGFb signaling is tightly linked to the ECM via LTBPs, Fibrillins and Fibulins. An additional analysis in the aforementioned regards has great potential to more specifically identify the cell type(s) affected by the loss of TGFb signaling and allow analysis of their specific transcriptomic changes in response and underlying mechanism(s) to postnatal injury.

      [The authors have added in detailed transcriptomic description of the fibroblast populations.]

      (2) Of the three major lung abnormalities encountered in BPD, the authors focus on alveolarization impairment in great detail, to very limited extend on inflammation, and not on vascularization impairment. However, this would be important not only to better capture the established pathohistologic abnormalities of BPD, but also is needed since the authors alter TGFb signaling, and inflammatory and vascular phenotypes with developmental loss of TGFb signaling and its activators have been described. Since the authors make the point about absence of inflammation in their BPD model, it will be important to show the evidence.

      [While this an important question, assessment of these components goes beyond the scope of this paper.]

      (3) Conceptually it would be important that in the discussion the authors reconcile their findings in the experimental BPD models in light of human BPD and potential implications it might have on new ways to target key pathways and cell types for treatment. This allows the scientific community to formulate the next set of questions in a disease relevant manner.

      [The authors have amended the discussion in this regard.]

      Comments on latest version:

      This reviewer would like to thank the authors for their efforts to address the concerns, in particular the better transcriptomic description of the fibroblast populations. The reviewer is well aware of the issues with PDGFRa antibodies that work on mouse tissue and also the problem with available reporters and lineage tracers in terms of haploinsufficiency.

      There are no further concerns from this reviewer's side.

    4. Reviewer #3 (Public review):

      This paper seeks to understand the role of alveolar myofibroblasts in the abnormal lung development after saccular stage injury.

      Strengths:

      (1) Multiple models of neonatal injury are used, hyperoxia and transgenic models that target alveolar myofibroblasts.

      (2) The authors integrate their data with prior published single-cell data from neonatal hyperoxia injury models and demonstrate concordant findings.

      Weaknesses:

      (1) As the authors acknowledge in the discussion, there are no spatial and temporal validation data of the single-cell findings. As the ductal myofibroblasts has many overlapping genes, localizing and quantifying the loss of these cells in injury as a plausible mechanistic driver would greatly strengthen the conclusion.

      (2) As they note in their response, this proved to be technically difficult and current Pdgfra-lineage trace tools are not without their own limitations.

      Summary:

      Taken together, this manuscript provides a rich data set from a model of irreversible neonatal lung injury. The single-cell analysis methods are well-articulated and the limitations are acknowledged, allowing this paper to provide a foundation for future work to spatially and temporally validate these claims.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We have responded to these criticisms below and have revised the main text and figures. Here, we outline the major points of our responses:

      (1) The reviewers asked for more clarification regarding cell type annotation in the lung mesenchyme as shown in Figure 3C. We have included a new supplementary figure (Supplementary Figure 2) which shows differentially expressed genes amongst these mesenchymal cell subsets using a variety of visualization tools including a heatmap, UMAP plots, and the dotplot which was originally shown in Supplementary Figure 1D. The other supplemental figures have been re-numbered.

      (2) We acknowledge the lack of consensus in the field regarding the nomenclature of fibroblast subsets in the developing mouse lung. We are not attempting to define new subsets, but rather we adopted annotations based on previously published work. Specifically, we used Seurat to define mesenchymal cell clusters and then compared the gene expression patterns of these clusters to published work by Hurskainen et al. (Bernard Thebaud’s group) and Narvaez Del Pilar et al. (Jichou Chen’s group). We acknowledge these annotations might conflict with other published data, but any approach to choosing a cell label would be subject to scrutiny. For example, Col13a1 fibroblasts share markers with cells which have been defined by others as lipofibroblasts or alveolar fibroblasts. Similarly, Col14a1 fibroblasts appear to share markers with matrix fibroblasts. Further work is clearly needed to address these discrepancies, and we hope that making our data publicly available will help that effort. 

      (3) The reviewers asked us to interrogate changes in canonical markers of fibroblast subsets (i.e. lipofibroblasts, matrix fibroblasts) to address whether the apparent loss of myofibroblasts could be explained by a change in myofibroblast specification/differentiation. We have included these data in the responses, but because we are unable to draw any clear conclusions from these results, we do not feel these data warrant inclusion in the manuscript/figures.

      (4) As highlighted in the eLife assessment, our study does not include tissue validation (i.e. immunohistochemistry) of myofibroblast markers to distinguish whether the loss of myofibroblasts is attributable to lack of proliferation and/or changes in differentiation/specification. We spent considerable time over the past few months attempting to address these questions, however we were unable to produce convincing PDGFRa staining on tissues that we had collected during our original studies. Without PDGFRa staining, we regretfully could not co-stain for other useful markers to assess proliferation (EdU), apoptosis (TUNEL or caspase), or fibroblast function/specification (ACTA2, SM22a/TAGLN, ADRP, etc). We suspect that these experiments would require optimization of tissue fixation/processing at the time of harvest or the inclusion of a Pdgfra lineage tool for better identification of these cells by immunohistochemistry. Given that the majority of Pdgfra lineage tools require a knock-in/knock-out approach, data generated using these tools should be interpreted with caution given our results here show that Pdgfra-haploinsufficiency alone worsens disease outcomes after hyperoxia exposure.

      In summary, we have addressed several concerns raised by the reviewers and have attempted to perform some of the additional experiments suggested.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors used both the commonly used neonatal hyperoxia model as well as cell-type-specific genetic inactivation of Tgfbr2 models to study the basis of BPD. The bulk of the analyses focus on the mesenchymal cells. Results indicate impaired myofibroblast proliferation, resulting in decreased cell number. Inactivation of Etc2 in Pdgfra-lineaged cells, preventing cytokinesis of myofibroblasts, led to alveolar simplification. Together, the findings demonstrate that disrupted myofibroblast proliferation is a key contributor to BPD pathogenesis.

      Strengths:

      Overall, this comprehensive study of BPD models advances our understanding of the disease. The data are of high quality.

      Weaknesses:

      The critiques are mostly minor and can be addressed without extensive experimentation.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors systematically explore the mechanism(s) of impaired postnatal lung development with relevance to BPD (bronchopulmonary dysplasia) in two murine models of 'alveolar simplification', namely hyperoxia and epithelial loss of TGFb signaling. The work presented here is of great importance, given the limited treatment options for a clinical entity frequently encountered in newborns with high morbidity and mortality that is still poorly understood, and the unclear role of TGFb signaling, its signaling levels, and its cellular effects during secondary alveolar septum formation, a lung structure generating event heavily impacted by BPD. The authors show that hyperoxia and epithelial TGFb signaling loss have similar detrimental effects on lung structure and mechanical properties (emphysema-like phenotype) and are associated with significantly decreased numbers of PDGFRa-expressing cells, the major cell pool responsible for generation of postnatal myofibroblasts. They then use a single-cell transcriptomic approach combined with pathway enrichment analysis for both models to elucidate common factors that affect alveologenesis. Using cell communication analysis (NicheNet) between epithelial and myofibroblasts they confirm increased projected TGFb-TGFbR interactions and decreased projected interactions for PDGFA-PDGFRA, and other key pathways, such as SHH and WNT. Based on these results they go on to uncover in a sequela of experiments that surprisingly, increased TGFb appears reactive to postnatal lung injury and rather protective/homeostatic in nature, and the authors establish the requirement for alpha V integrins, but not the subtype alphaVbeta6, a known activator of TGFb signaling and implied in adult lung fibrosis. The authors then go beyond the TGFb axis evaluation to show that mere inhibition of proliferation by conditional KO of Ect2 in Pdgfra lineage results in alveolar simplification, pointing out the pivotal role of PDGFRa-expressing myofibroblasts for normal postnatal lung development.

      Strengths:

      (1) The approach including both pharmacologic and mechanistically-relevant transgenic interventions both of which produced consistent results provides robustness of the results presented here.

      (2) Further adding to this robustness is the use of moderate levels of hyperoxia at 75% FiO2, which is less extreme than 100% FiO2 frequently used by others in the field, and therefore favors the null hypothesis.

      (3) The prudent use of advanced single-cell analysis tools, such as NicheNet to establish cell interactions through the pathways they tested and the validation of their scRNA-seq results by analysis of two external datasets. Delineation of the complexity of signals between different cell types during normal and perturbed lung development, such as attempted successfully in this study, will yield further insights into the underlying mechanism(s).

      (4) The combined readout of lung morphometric (MLI) and lung physiologic parameters generates a clinically meaningful readout of lung structure and function.

      (5) The systematic evaluation of TGFb signaling better determines the role in normal and postnatally-injured lungs.

      Weaknesses:

      (1) While the study convincingly establishes the effect of lung injury on the proliferation of PDGFRa-expressing cells, differentiation is equally important. Characterization of PDGFRa expressing cells and tracking the changes in the injury models in the scRNA analysis, a key feature of this study, would benefit from expansion in this regard. PDGFRa lineage gives rise to several key fibroblast populations, including myofibroblasts, lipofibroblasts, and matrix-type fibroblasts (Collagen13a1, Collagen14a1). Lipofibroblasts constitute a significant fraction of PDGFRa+ cells, and expand in response to hyperoxic injury, as shown by others. Collagen13a1-expressing fibroblasts expand significantly under both conditions (Figure 3), and appear to contain a significant number of PDGFRa-expressing cells (Suppl Fig.1). Effects of the applied injuries on known differentiation markers for these populations should be documented. Another important aspect would be to evaluate whether the protective/homeostatic effect of TGFb signaling is supporting the differentiation of myofibroblasts. Postnatal Gli1 lineage gains expression of PDGFRa and differentiation markers, such as Acta2 (SMA) and Eln (Tropoelastin). Loss of PDGFRa expression was shown to alter Elastin and TGFb pathway-related genes. TGFb signaling is tightly linked to the ECM via LTBPs, Fibrillins, and Fibulins. An additional analysis in the aforementioned regard has great potential to more specifically identify the cell type(s) affected by the loss of TGFb signaling and allow analysis of their specific transcriptomic changes in response and underlying mechanism(s) to postnatal injury.

      We attempted to conduct additional analyses on our sequencing data to evaluate the impact of lung injury on the differentiation of Pdgfra-expressing cells towards other fibroblast lineages. To specifically address the impact of hyperoxia on fibroblast differentiation, we subsetted wildtype cells collected at the P7 timepoint (while pups were still undergoing hyperoxia treatment) from the larger data set. Shown below are several Violin Plots comparing gene expression between RA and O2 conditions across the mesenchymal populations.

      Although there are some interesting observations in this analysis, we could not identify a consistent theme from these data which could clearly answer the reviewers’ questions. We see a clear reduction of Pdgfra and Eln in both myofibroblast subsets with hyperoxia, which support our findings of reductions in the myofibroblast subsets. Acta2 and Tagln appear slightly lower in alveolar myofibroblasts, but both are higher in ductal myofibroblasts. Interestingly, both Acta2 and Tagln are higher in Col14a1 fibroblasts with hyperoxia. The functional relevance of these data are unclear because there appears to be higher per-cell expression of Acta2 in ductal myofibroblasts while the relative contribution of these cells is reduced (Figure 3D-E). Col14a1 fibroblasts show increased Acta2 and Tagln expression and are slightly increased in proportion at P7 with hyperoxia treatment (Figure 3D), albeit to a much lesser degree compared to Col13a1 fibroblasts.

      Author response image 1.

      Markers of ductal myofibroblasts including Hhip, Cdh4, and Aspn all appear lower with hyperoxia. Interestingly Plin2 expression is only slightly increased in Col13a1 fibroblasts with hyperoxia treatment, and there is also increased expression in alveolar myofibroblasts. Tcf21 is another marker commonly used to identify lipofibroblasts and its expression is similarly increased in myofibroblasts during hyperoxia, although its expression is conversely lower in Col13a1 and Col14a1 fibroblasts in our data. Overall, these data would appear consistent with recently published data by Ricetti et al. in which the authors observed an increase in lipofibroblast gene signatures and reduced myofibroblast gene signatures with hyperoxia treatment.

      Author response image 2.

      Author response image 3.

      The ability of our data to clearly identify changes in cell fate differentiation is limited by our use of Seurat to define cell clusters because these methods are likely to mask subtle gene expression changes in a small number of cells nested within a parent cluster. In the example above with Plin2, the change in Plin2 expression within myofibroblasts is not significant enough for Seurat to pull these cells out from their parent clusters to define a different lineage, nor are these cells similar enough in their current moment in time to be considered Col13a1 fibroblasts or lipofibroblasts. Increasing the dimensions used to define Seurat clusters might be sufficient to identify this subset of cells as a distinct cluster, however this approach would come at the expense of creating several more cell subsets with increasingly small populations which would be difficult to further analyze.

      One alternative approach to address these questions regarding differentiation might include using pseudo-time analysis of our sequencing data to predict cell lineage. Unfortunately, these analyses are beyond the scope of our current study, but we hope that our public data set can be used by investigators hoping to utilize this approach. Another method to address these questions could utilize a pulse-chase lineage experiment where one could label Pdgfra-expressing cells at the onset of injury and compare the differentiation of these labeled cells following injury. Li et al. conducted a similar experiment with hyperoxia in which Pdgfra-expressing cells were labeled during embryonic development and then postnatally following hyperoxia exposure. The authors noted a decrease in both lineaged myofibroblasts and lineaged lipofibroblasts and concluded that Pdgfra-lineaged cells were lost with hyperoxia treatment rather than undergoing aberrant differentiation. While these experiments likely have their own caveats related to the timing and efficiency of labeling, they represent a more conclusive approach to addressing differences in cell specification as compared to our sequencing- and flow cytometry-based approaches.

      Author response image 4.

      Author response image 5.

      (2) Of the three major lung abnormalities encountered in BPD, the authors focus on alveolarization impairment in great detail, to a very limited extent on inflammation, and not on vascularization impairment. However, this would be important not only to better capture the established pathohistologic abnormalities of BPD, but also it is needed since the authors alter TGFb signaling, and inflammatory and vascular phenotypes with developmental loss of TGFb signaling and its activators have been described. Since the authors make the point about the absence of inflammation in their BPD model, it will be important to show the evidence.

      We acknowledge that vascular changes significantly contribute to BPD pathogenesis, however our study was not designed to adequately characterize changes in vascular/endothelial cells. We were motivated to focus on the lung mesenchyme after observing a dramatic loss of PDGFRa+ cells with our initial characterization of the hyperoxia injury model (Figure 2). At the onset of our study, the existing publicly available data did not contain enough mesenchymal cells for in-depth analysis. To generate new observations and hypotheses within the lung mesenchyme we enriched our single cell prep for mesenchymal cells at the time of FACS-sorting to ensure we would have sufficient cell numbers for downstream analysis.

      (3) Conceptually it would be important that in the discussion the authors reconcile their findings in the experimental BPD models in light of human BPD and the potential implications it might have on new ways to target key pathways and cell types for treatment. This allows the scientific community to formulate the next set of questions in a disease-relevant manner.

      We have edited text in the discussion to address this point.

      Reviewer #3 (Public Review):

      Summary:

      This paper seeks to understand the role of alveolar myofibroblasts in abnormal lung development after saccular stage injury.

      Strengths:

      Multiple models of neonatal injury are used, including hyperoxia and transgenic models that target alveolar myofibroblasts.

      Weaknesses:

      There are several weaknesses that leave the conclusions significantly undersupported by the data as presented:

      (1) There is no validation of the decreased number of myofibroblasts suggested by flow cytometry/scRNAseq at the level of the tissue. Given that multiple groups have reported increased myofibroblasts (aSMA+ fibroblasts) in humans with BPD and in mouse models, demonstrating a departure from prior findings with tissue validation in the mouse models is essential. There are many reasons for decreased numbers of a subpopulation by flow cytometry, most notably that injured cells may be less likely to survive the cell sorting process.

      Unfortunately, we were unable to produce convincing PDGFRa staining on tissues that we had collected during our original studies. Without PDGFRa staining, we regretfully could not co-stain for other useful markers to assess proliferation (EdU), apoptosis (TUNEL or caspase), or fibroblast function/specification (aSMA/ACTA2, SM22a/TAGLN, ADRP, etc). We suspect that these experiments would require optimization of tissue fixation/processing at the time of harvest or the inclusion of a Pdgfra lineage tool for better identification of these cells by immunohistochemistry. Given that the majority of Pdgfra lineage tools require a knock-in/knock-out approach, data generated using these tools should be interpreted with caution given our results here show that Pdgfra-haploinsufficiency alone worsens disease outcomes after hyperoxia exposure.

      Our single cell data show that there is increased expression of Acta2 and Tagln shown in the plots which might be consistent with the increased aSMA staining which others have observed in these settings. Interestingly, the transcripts of both genes are reduced in alveolar fibroblasts while increased in ductal myofibroblasts, Col13a1 fibroblasts, Col14a1 fibroblasts, and vascular smooth muscle. We did not include aSMA antibody staining in our flow cytometry experiments, but this would certainly add value to future attempts to characterize the phenotypic changes occurring during these injury models. 

      (2) The hallmark genes used to define the subpopulations are not given in single-cell data. As the definition of fibroblast subtypes remains an area of unsettled discussion in the field, it is possible that the decreased number by classification and not a true difference. Tissue validation and more transparency in the methods used for single-cell sequencing would be critical here.

      See response above and new Supplemental Figure 2.

      (3) There is an oversimplification of neonatal hyperoxia as a "BPD model" used here without a reference to detailed prior work demonstrating that the degree and duration of hyperoxia dramatically change the phenotype. For example, Morty et al have shown that hyperoxia of 85% or more x 14 days is required to demonstrate the septal thickening observed in severe human BPD. Other than one metric of lung morphometry (MLI), which is missing units on the y-axis and flexivent data, the authors have not fully characterized this model. Prior work comparing 75% O2 exposure for 5, 8, or 14 days shows that in the 8-day exposed group (similar to the model used here), much of the injury was reversible. What evidence do the authors have that hyperoxia alone is an accurate model of the permanent structural injury seen in human BPD?

      At the onset of our studies, we noted that several groups were using widely variable protocols ranging from 60-100% O2 exposure. Morty et al. have indeed conducted thorough experiments to characterize various different hyperoxia exposure protocols. In their 2017 study (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5312005/) they showed that 85% O2 from P1-P7 was sufficient to produce increased septal thickness compared to control mice, and this change was comparable to P1-P14 exposure with 85% O2. Interestingly, they also noted that some therapeutic interventions could rescue disease caused by 60% O2 but not 85% O2 exposure. Our criteria in choosing a treatment protocol were: (1) nursing dams and pups survived hyperoxia exposure, (2) injury was reproducible across cohorts, and (3) injury was not reversible simply by recovering in room air. We found that recent work utilizing 75% O2 exposure was sufficient to cause the alveolar simplification phenotype which we sought to investigate. In our hands, we did not observe mortality of nursing dams or pups except for litters lost to cannibalism/failure of cross-fostering.

      We are confident that the injury caused by our hyperoxia protocol is not reversible simply by recovering mice in room air. Several groups have phenotyped mice at P4, P10, or P14 immediately following the conclusion of hyperoxia treatment. To ensure that we were studying a lasting, irreversible phenotype, we conducted our endpoint studies (morphometry and lung physiology) at P40. Because mice continue to undergo alveolarization until ~P36-P39, we reasoned that this additional recovery time following cessation of hyperoxia would allow for spontaneous recovery if this injury was transient. Additionally, shown below are unpublished flexiVent data in which mice were treated for 10 days with 75% O2 and recovered until analysis at 10 weeks of age. These results are entirely consistent with the flexiVent data we have included in the manuscript, and the persistence of lung physiologic changes in adult mice suggest the presence of permanent underlying structural changes. We did not conduct morphometry/MLI studies at later timepoints, but we have no reason to suspect a different outcome given the clear results from lung physiology.

      Author response image 6.

      (4) Thibeault et al published a single-cell analysis of neonatal hyperoxia in 2021, with seemingly contrasting findings. How does this dataset compare in context?

      Our data is complimentary to the single-cell analysis published by Thebaud et al. We included a re-analysis of their mesenchymal data in Supplementary Figure 2 which shows they also observed a relative decrease in myofibroblast clusters at the P7 and P14 timepoints following hyperoxia treatment. Figure 4 of their paper highlights the top differentially expressed genes between RA and O2 in Col13a1 FB and myofibroblasts, and we observe nearly identical findings in our data set within each of these clusters. Below we have created dotplots of P7 wildtype samples for the same selected genes shown in Figure 4G of the Thebaud et al. paper. It is important to note that their clustering pooled all myofibroblasts into one cluster, while our data is divided into alveolar myofibroblasts and ductal myofibroblasts. The other difference is their data set includes all timepoints P3, P7 and P14 pooled for display, while the plot we selected for simplicity here is only P7 cells. From these data we can see that the general trends are identical to those observed by Thebaud et al., and the differences in genes such as Acta2 can be accounted for by different changes observed in the different myofibroblast clusters – which is identical to what is shown in the violin plots above – namely that Acta2 is reduced in hyperoxia in alveolar myofibroblasts while increased in the ductal myofibroblasts.

      Author response image 7.

      Alveolar myoFB

      Author response image 8.

      Ductal myoFB

      One difference between our two datasets is the relative contribution of myofibroblast and Col13a1 fibroblasts to the entire mesenchymal population of cells. Over 50% of all mesenchymal cells in our preps consist of myofibroblasts, while most of their mesenchymal cells are Col13a1 fibroblasts. These differences are likely accounted for by differences in tissue digestion and cell preparation protocols. However, despite these differences, their data show the same trends of decreased myofibroblasts and a relative expansion in Col13a1 fibroblasts.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 1, for the hyperoxia model, it is informative to have the analysis done at P40, while most of the previous studies using this model focus on outcomes shortly after the end of the hyperoxia regimen. The authors state "we did not see evidence of fibrosis, scarring, or inflammation." It will be helpful to include data supporting this conclusion, especially ACTA2, CTHRC1, and CD45 staining.

      We did not conduct trichrome staining or hydroxyproline assays to quantify the absence of fibrotic changes because there were no gross histologic changes consistent with scarring or fibrosis by H&E staining. We have amended the text to say “we did not see evidence of fibrosis or scarring” since we did not publish any changes to characterize the immune cell compartment.

      (2) Figure 3, single cell analysis, naming of the clusters is confusing. Is "alveolar myofibroblasts" the same as "secondary crest myofibroblasts"? Is "Col13a1 FB" the same as "alveolar fibroblasts" and "Col14a1 FB" the same as "adventitial fibroblasts"? The loss of myofibroblasts is intriguing because, by staining, there is an increase of ACTA2+ cells. Are ACTA2+ cells not myofibroblasts in scRNAseq data?

      As mentioned in responses above, we used Jichou Chen’s nomenclature of “alveolar myofibroblasts” and “ductal myofibroblasts”, but we agree that the former cluster is most consistent with “secondary crest myofibroblasts”. To distinguish the two remaining clusters of fibroblasts we used the same nomenclature as found in Thebaud et al’s single cell data set- “Col13a1 FB and “Col14a1 FB”. The Col13a1 FB cluster is most consistent with “alveolar fibroblasts” and contains high expression of several genes used to define “lipofibroblasts”, though it is unclear whether the latter may represent a subcluster within the Col13a1 FB cluster.

      As shown above, Acta2 is expressed broadly within the lung mesenchyme with highest levels found in myofibroblasts and smooth muscle cells.

      (3) Phosphorylated SMAD2/3 staining (e.g. Cell Signaling antibody) in the two models will be informative to show where TGF signaling activity is altered.

      We have not been successful in using SMAD2/3 staining to infer changes in TGFb signaling at the resolution needed to address this question. Other groups have shown qPCR and western blot data for SMAD2/3 signaling from whole lung extracts, but these approaches lack cell type and specificity and do not address spatial changes. We attempted to incorporate pSMAD2/3 staining into our flow cytometry experiments, but the staining protocol did not work in our hands.

      (4) Is cell death increased in the multiple models that showed simplification?

      While our EdU experiments address proliferation, we were unable to perform PDGFRa and TUNEL/caspase co-staining by histology to address apoptosis/cell death in our different models. Shown here is data from P7 wildtype mice in which Cdkn1a (promoting arrest of cell cycle), and pro-apoptotic genes Bax, Bak1, and Fas are all upregulated in hyperoxia in several mesenchymal cell populations including myofibroblasts.

      Author response image 9.

      (5) Wording: "These data suggest that avb6 does not play a role in TGFb activation during normal development or neonatal hyperoxia, while av-integrins in the lung mesenchyme are required for normal development and play a protective role in response to hyperoxia." The first half of the sentence is missing a reference to the epithelium.

      Text now reads "These data suggest that epithelial avb6 does not play a role…”

      Reviewer #2 (Recommendations For The Authors):

      The reviewer greatly appreciates the work presented here, especially the hard task of addressing combined signaling pathway input into key mesenchymal cell types during an essential expansion of alveolar surface area in postnatal lung and its effect upon disturbance.

      The issues of concern are mentioned in the public review and are expanded upon below:

      (1) Expanded characterization of PDGFRa+ expressing cells in the scRNA dataset is needed (see public review). Also included should be some of the key myofibroblast genes (elastin, Acta2, etc.) and their changes in the relevant cell populations. It would be important to show (at least at the transcriptional level) that myofibroblast differentiation is impaired if the author claims that the alveolarization defect is due to functional myofibroblast impairment. Furthermore, Ect2 expression and changes with treatments should be shown for the different cell populations (relevant to Figure 9).

      See responses above

      (2) The authors stated that they did not find evidence of fibrosis, scarring, and inflammation, but did not provide data to support this statement. Given the importance of at least the inflammation component in BPD, the absence of inflammation needs to be shown, especially in the model using the TGFBR2-cKO mouse, where at least their data show a trend to increased CD45 cell numbers (Figure 2), and upregulated inflammatory upstream regulators (IL10, IFNa, IKBKB, CEBPB upregulated) in the IPA (Figure 3). BAL and/or tissue by flow or IHC have been used to assess different immune cell populations. In terms of evaluation of vascular impairment, the single-cell data set contains endothelial cells, vascular smooth muscle, and pericytes, which allows interrogation following the two different types of injury (hyperoxia cKO TGFbR2) used for the scRNA-seq experiments).

      A full characterization of the immune cell or vascular/endothelial cell compartment within our models is beyond the scope of this current study as we were focusing on the shared changes observed within the lung mesenchyme. None of these compartments exist in isolation, so of course there are likely to be correlative and/or causative changes observed in each of the different models which we studied. We did consider further phenotypic analysis of the immune cells by flow cytometry within our different models, but deferred these experiments for future studies. As mentioned earlier we have omitted the reference to “no inflammation”.

      (3) The authors should report several litters per experiment and experimental group, mortality in the groups, and if present, visualize using e.g. Caplan-Meyer curves. The switch of the mothers during treatment, the early postnatal injections and treatments, and variability in outcome measures between different litters have to be anticipated. Therefore at least 2 litters, but preferably 3 litters per experiment should be examined, to show reproducibility.

      All experiments were conducted with at least 2-3 contemporaneous litters in each treatment group as this was necessary to have enough animals per treatment condition/group to achieve statistical significance. This was essential as all experiments were conducted on the C57BL/6 background where litter sizes are typically 6-8 pups in our colony. We did not encounter any maternal mortality related to hyperoxia exposure while rotating between hyperoxia and normoxia every 48 hrs. Loss of pups in our experiments was mostly due to cannibalism either immediately after birth or from neglect due to failure of cross-fostering.

      (4) The reviewer is concerned about using PBS as a control for experiments involving antibody treatment, in this case, 1D 11. The use of an isotype IgG would be the most appropriate and convincing control. In this case, an isotype-matched murine IgG1 control (13C4) has already been generated and is commercially available. While the reviewer does not suggest repeating all experiments, at least one small experiment showing that control IgG does not alter the lung phenotype with hyperoxia when compared with 1D11 would be important.

      We appreciate the reviewer’s suggestion and will consider an isotype antibody comparison in future studies. While not directly comparing 1D11 to isotype, we can share data in which we compared PBS to a different antibody. In this experiment, we attempted to use antibody blockade during the first 10 days of life while mice were undergoing hyperoxia treatment to target a specific component of the TGFb pathway. We observed no difference in outcomes either in RA or O2 when comparing PBS to xxx antibody. We cannot share the antibody identity due to intellectual property reasons, however additional studies confirmed that this antibody likely had no impact due to poor in vivo blocking activity.

      Author response image 10.

      (5) While inhibited proliferation is one possible explanation for the decrease of PDGFRa expression in the injured mice, there should be consideration of increased and/or premature apoptosis (before the physiologically observed wave P14-P20) as another reason. Also, do the authors propose that only proliferation results in alveolarization impairment, but differentiation plays no significant role here? If that is the case that would mean that there are some fully-differentiated myofibroblasts in the alveolar septa, but not enough to create the multitude of alveolar septal walls. Have the authors evaluated the decrease in secondary alveolar septa formed per alveolar airspace? This measure would give some sense of whether septum initiation was prevented or whether septa were formed, but are structurally abnormal, e.g. due to altered ECM (suspected decrease in Elastin and SMA expression, if myofibroblast differentiation was impaired or cell content (suspected decrease in myofibroblasts and increase of other cell types, such as lipofibroblasts).

      Apoptosis/cell death are likely to play a role in addition to inhibited proliferation. See violin plots shown above with cell cycle arrest and pro-apoptotic genes upregulated within the mesenchyme. Because we were unable to optimize tissue sections/staining with the samples collected during the early time points of our experiments (ie P4, P7, P10, P14), we are unable to co-stain for markers of apoptosis and answer this question in a direct manner. Future experiments will focus on additional characterization of these early changes with particular attention to altered fibroblast phenotypes within the alveolar septae.

      (6) An illustration depicting key cells and the pathways involved in cartoon format would be a useful addition and visualize the important conclusions of this paper for the reader.

      We appreciate this suggestion but think the results are sufficiently straightforward that a summary cartoon would not add much.

      Figure 4A: the legend appears to be switched. The gray square seems to align with the epithelial ligands, while the blue square aligns with receptors.

      Thank you for identifying this mistake – fixed.

      Names of transgenic lines used through manuscript:

      Please use the correct name, as per JAX would be either Gli1tm3(cre/ERT2)Alj/J or Gli1-CreERT2.

      Please use the correct name, as per JAX would be either Pdgfratm1.1(cre/ERT2)Blh/J or Pdgfrα-CreERT2.

      PDGFRa-CRE would be JAX# 013148.

      The transgenic lines have been noted in the methods, and we have edited the text of the manuscript to reflect the correct names of these lines. For the supplementary figure 4 which compares Gli1-CreERT2 to Pdgfrα-CreERT2, we left our prior nomenclature intact because it better reflects that each of these lines are haploinsufficient at their targeted loci, and that the controls are cre-negative littermates.

      We did not use the PDGFRa-CRE line (JAX# 013148).

      Reviewer #3 (Recommendations For The Authors):

      - More transparency about the single-cell analysis is required: 1) how are cell types and clusters defined? 2) what strategy was used for ambient RNA? 3) how do the controls compare with recently published mouse developmental datasets? 4) how does this model compare with the single-cell dataset published by Thibeault et al in 2021 (neonatal hyperoxia x 14 days with multiple time points used)?

      See responses above.

      - Tissue level validation of these findings is essential by RNA ISH or IF. While validation that the same process is at play in human tissue would be ideal, if this is not available, the conclusions must be tempered in the discussion.

      See responses above.

      - Is this more mild neonatal injury reversible in mice? As noted above, more characterization of this model (and placing it in the context of other more widely published models would be helpful).

      See responses above.

    1. eLife Assessment

      This study reveals that the malaria parasite protein PfHO, although lacking typical heme oxygenase activity, is essential for the survival of Plasmodium falciparum. Structural and localization analyses demonstrated that PfHO plays a critical role in maintaining the apicoplast, specifically in gene expression and biogenesis, suggesting an adaptive function for this protein in parasite biology. While the findings convincingly support the authors' claims, further investigation into apicoplast gene expression and the specific function of PfHO remains a future challenge. The topic and results are important and will be of interest to researchers studying various aspects of malaria, Plasmodium physiology, host-pathogen interactions, and heme metabolism.

    2. Reviewer #1 (Public review):

      Malaria parasites detoxify free heme molecules released from digested host hemoglobins by biomineralizing them into inert hemozoin. Thus, why malaria parasites retain PfHO, a dead enzyme that loses the capacity of catabolizing heme, is an outstanding question that has puzzled researchers for more than a decade. In the current manuscript, the authors addressed this question by first solving the crystal structure of PfHO and aligning it with structures of other heme oxygenase (HO) proteins. They found that the N-terminal 95 residues of PfHO, which failed to crystalize due to its disordered nature, may serve as signal and transit peptides for PfHO subcellular localization. This was confirmed by subsequent microscopic analysis with episomally expressed PfHO-GFP and a GFP reporter fused to the first 83 residues of PfHO (PfHO N-term-GFP). To investigate the functional importance of PfHO, the authors generated an anhydrotetracycline (aTC) controlled PfHO knockdown strain. Strikingly, the parasites lacking PfHO failed to grow and lost their apicoplast. Finally, by chromatin immunoprecipitation (ChIP), quantitative PCR/RT-PCR and growth assays, the authors showed that both the cognate N-terminus and HO-like domain were required for PfHO function as an apicoplast DNA interacting protein.

      The authors systemically performed multidisciplinary approaches to address this difficult question: what is the function of this enzymatically dead PfHO? I enjoyed reading this manuscript and its thoughtful discussion. This study is not only of clinical importance for antimalarial treatments but also deepens our understanding of protein function evolution.

      The authors proposed that PfHO interacts with apicoplast genome DNA via the electropositive N-terminus. Interestingly, these positively charged residues are not conserved between Plasmodium, Theileria and Babesia. I will be curious to follow the authors' future work to investigate the function of this electropositive N-terminus, possibly by comparative and mutagenesis analysis?

    3. Reviewer #2 (Public review):

      Summary:

      Blackwell et al. investigated the structure, localization and physiological function of Plasmodium falciparum (Pf) heme oxygenase (HO). Pf and other malaria parasites scavenge and digest large amounts of hemoglobin from red cells for sustenance. To counter the potentially cytotoxic effects of heme, it is biomineralized into hemozoin and stored in the food vacuole. Another mechanism to counteract heme toxicity is through its enzymatic degradation via heme oxygenases. However, it was previously found by the authors that PfHO lacks the ability to catalyze heme degradation, raising the intriguing question of what the physiological function of PfHO is. In the current contribution, the authors determine that PfHO localizes to the apicoplast, determine its targeting sequence, establish the essentiality of PfHO for parasite viability, and determine that PfHO is required for proper maintenance of apicoplasts and apicoplast gene expression. In sum, the authors establish an essential physiological function for PfHO, thereby providing new insights into the role of PfHO in plasmodium metabolism.

      Strengths:

      The studies are rigorously conducted and the results of the experiments unambiguously support a role for PfHO as being an apicoplast targeted protein required for parasite viability and maintenance of apicoplasts.

      Weaknesses:

      While the studies conducted are rigorous and support the primary conclusions, the lack of experiments probing the molecular function of PfHO somewhat limits the impact of the work. Nevertheless, knowledge that PfHO is required for parasite viability and plays a role in the maintenance of apicoplasts is still an important advance.

      Comments on revisions:

      The authors thoughtfully addressed all the reviewer comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment:

      This important study reveals that the malaria parasite protein PfHO, though lacking typical heme oxygenase activity, is vital for the survival of Plasmodium falciparum. Structural and localization analyses showed that PfHO is essential for apicoplast maintenance, particularly in gene expression and biogenesis, indicating a novel adaptive role for this protein in parasite biology. While the results supporting the claims of the authors are convincing, the lack of data defining a molecular understanding or mechanism of action of the protein in question limits the impact of the study. 

      We appreciate the positive assessment. We agree that further mechanistic understanding of PfHO function remains a key future challenge. Indeed, we made extensive efforts to unravel the molecular interactions and mechanisms that underpin the critical function of PfHO. We elucidated key interactions between PfHO and the apicoplast genome, reliance of these interactions on the electropositive N-terminus, association of PfHO with DNA-binding proteins, and a specific defect in apicoplast mRNA levels upon PfHO knockdown. The major limitation we faced in further defining PfHO function is the general lack of understanding of apicoplast transcription and broader gene expression in this organelle. That limitation and the challenges to overcome it go well beyond our study and will require concerted efforts across several manuscripts (likely by multiple groups) to define the mechanistic features of apicoplast gene expression. We look forward to contributing further molecular understanding of PfHO function as broader understanding of apicoplast transcription emerges.

      Public Reviews:

      Reviewer #1 (Public Review):

      Malaria parasites detoxify free heme molecules released from digested host hemoglobins by biomineralizing them into inert hemozoin. Thus, why malaria parasites retain PfHO, a dead enzyme that loses the capacity of catabolizing heme, is an outstanding question that has puzzled researchers for more than a decade. In the current manuscript, the authors addressed this question by first solving the crystal structure of PfHO and aligning it with structures of other heme oxygenase (HO) proteins. They found that the N-terminal 95 residues of PfHO, which failed to crystalize due to their disordered nature, may serve as signal and transit peptides for PfHO subcellular localization. This was confirmed by subsequent microscopic analysis with episomally expressed PfHO-GFP and a GFP reporter fused to the first 83 residues of PfHO (PfHO N-term-GFP). To investigate the functional importance of PfHO, the authors generated an anhydrotetracycline (aTC) controlled PfHO knockdown strain. Strikingly, the parasites lacking PfHO failed to grow and lost their apicoplast. Finally, by chromatin immunoprecipitation (ChIP), quantitative PCR/RT-PCR, and growth assays, the authors showed that both the cognate N-terminus and HO-like domain were required for PfHO function as an apicoplast DNA interacting protein.

      The authors systemically performed multidisciplinary approaches to address this difficult question: what is the function of this enzymatically dead PfHO? I enjoyed reading this manuscript and its thoughtful discussion. This study is not of clinical importance for antimalarial treatments but also deepens our understanding of protein function evolution. While I understand these experiments are challenging to conduct in malaria parasites, the data quality of some of the experiments could be improved. For example, most of the Western blots and Southern blots are not of high quality. 

      We thank the reviewer for the positive comments but are a bit puzzled by the final statement about western and Southern blot quality. We agree that the two anti-PfHO western blots probed with custom antibody (Fig. 3- source data 2 and 8) have substantial background signal in the higher molecular mass region >75 kDa. However, we note that the critical region <50 kDa is clear in both cases and readily enables target band visualization. All other western blots probing GFP or HA epitopes are of high quality with minimal off-target background. We present two Southern blot images. We agree that the signal is somewhat faint for the Southern blot demonstrating on-target integration of the aptamer/TetR-DOZI plasmid (Fig. 3- fig. supplement 4), although we note that the correct band pattern for integration is visible. We also note that the accompanying genomic PCR data is unambiguous. The Southern blot for GFPDHFRDD incorporation into the PfHO locus (Fig. 3- fig. supplement 1) has clear signal and strongly supports on-target integration. The minor background signal in the lower left region of the image does not extend into the critical lanes nor impact interpretation of correct clonal integration.

      As noted below, we have obtained a second western blot image to evaluate the decrease in PfHO protein expression in -aTC conditions. This revised image, which we now include in Fig. 3, shows clean detection of the PfHO signal in the critical molecular mass region below 40 kDa in +aTC conditions and substantial loss of this signal in -aTC conditions (relative to HSP60 loading control).

      Reviewer #2 (Public Review):

      Summary: 

      Blackwell et al. investigated the structure, localization, and physiological function of Plasmodium falciparum (Pf) heme oxygenase (HO). Pf and other malaria parasites scavenge and digest large amounts of hemoglobin from red cells for sustenance. To counter the potentially cytotoxic effects of heme, it is biomineralized into hemozoin and stored in the food vacuole. Another mechanism to counteract heme toxicity is through its enzymatic degradation via heme oxygenases. However, it was previously found by the authors that PfHO lacks the ability to catalyze heme degradation, raising the intriguing question of what the physiological function of PfHO is. In the current contribution, the authors determine that PfHO localizes to the apicoplast, determine its targeting sequence, establish the essentiality of PfHO for parasite viability, and determine that PfHO is required for proper maintenance of apicoplasts and apicoplast gene expression. In sum, the authors establish an essential physiological function for PfHO, thereby providing new insights into the role of PfHO in plasmodium metabolism. 

      Strengths: 

      The studies are rigorously conducted and the results of the experiments unambiguously support a role for PfHO as being an apicoplast-targeted protein required for parasite viability and maintenance of apicoplasts. 

      Weaknesses: 

      While the studies conducted are rigorous and support the primary conclusions, the lack of experiments probing the molecular function of PfHO limits the impact of the work. Nevertheless, the knowledge that PfHO is required for parasite viability and plays a role in the maintenance of apicoplasts is still an important advance.

      We appreciate the positive assessment. We agree that further mechanistic understanding of PfHO function remains a key future challenge. Indeed, we made extensive efforts to unravel the molecular interactions and mechanisms that underpin the critical function of PfHO. We elucidated key interactions between PfHO and the apicoplast genome, reliance of these interactions on the electropositive N-terminus, association of PfHO with DNA-binding proteins, and a specific defect in apicoplast mRNA levels upon PfHO knockdown. The major limitation we faced in further defining PfHO function is the general lack of understanding of apicoplast transcription and broader gene expression. That limitation and the challenges to overcome it go well beyond our study and will require concerted efforts across several manuscripts (likely by multiple groups) to define the mechanistic features of apicoplast gene expression. We look forward to contributing further molecular understanding of PfHO function as broader understanding of apicoplast transcription emerges.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors): 

      Specifically, I would like to see the expression of PfHO in the 3D7 strain and PfHOaptamer/TetR-DOZI parasites detected by PfHO antibody on the same blot. The reason is that while most of the western blots show that PfHO appears as both pro- and processed-form, Figure 3-S5B shows only the processed-form of PfHO in all life stages of 3D7. It would be interesting to find out if the processing of PfHO1 is strain/stage-specific, and whether it is regulated by heme levels. It may also be interesting to find out if the pro-form of PfHO is also functional (i.e. mutate the cleavage site). 

      We agree with the reviewer that Fig. 3- figure supplement 5B shows predominant detection of a single band for PfHO in untagged 3D7 parasites. In our experience, the detection of the unprocessed, pro form of PfHO can vary idiosyncratically with different experiments and cultures. In support of this variable detection of unprocessed PfHO in 3D7, we note in Fig. 3A that we detected both the unprocessed and processed forms of PfHO in a western blot of endogenously tagged PfHO-GFP-DHFRDD in 3D7 parasites with an intact apicoplast. We agree with the reviewer that future studies of stage-dependent processing of PfHO may give insights into conditions that favor or disfavor detection of the unprocessed protein. 

      Given prior evidence for vestigial heme binding by PfHO (Sigala et al. JBC 2012), we considered whether such heme binding might modulate PfHO expression, stability, and/or function. It is unknown if heme is present inside the apicoplast, and we currently lack evidence for heme-dependent function or expression by PfHO. Future studies can test this possible dependence.

      Regarding processing and possible function of the cleaved peptide, we note that the Nterminal 18 amino acids are expected to constitute the signal peptide that is cleaved cotranslationally with import into the ER. Our data indicate that PfHO undergoes further processing upon import into the apicoplast to remove a further 15 residues. We currently have no evidence nor expectation that these additional residues contribute to PfHO function beyond targeting to the apicoplast.

      I am also confused as to why the authors used rabbit anti-PfHO and rabbit anti-Ef1α on the same blot for Figure 3C, which makes it difficult to appreciate the expression changes of PfHO. Given the high non-specific background of PfHO antibody shown by other Western blots (Figure 3 - Source data 2), I would like to see a blot stained with only PfHO antibody to show that expression of PfHO has been efficiently reduced in the absence of aTC. 

      Bands for Ef1α (50 kDa) and untagged PfHO (~32 kDa) are readily distinguished by western blot analysis based on their distinct molecular masses and electrophoretic mobilities. We agree that staining with the anti-PfHO antibody resulted in background bands in other regions of the gel image, especially in the higher molecular mass region >75 kDa. We note that additional strong evidence for down-regulation of PfHO expression is provided in Fig. 3- figure supplement 6, which shows specific loss of PfHO mRNA transcript levels in -aTC conditions by RT-qPCR. 

      Nevertheless, we have followed the reviewer’s suggestion and provided a new WB image of PfHO expression ±aTC (probed only with rabbit anti-PfHO antibody) that shows strong down-regulation of PfHO protein levels in -aTC conditions, consistent with the strong growth phenotype observed. We have inserted this revised, cleaner western blot image into Fig. 3 (along with detection of HSP60 levels in replicate samples as loading control) and placed the prior image into Fig. 3- figure supplement 6. In both cases, densitometry analysis indicates an 80-85% reduction in PfHO levels in -aTC conditions.

      The authors proposed that PfHO interacts with apicoplast genome DNA via the electropositive N-terminus. Interestingly, these positively charged residues are not conserved between Plasmodium, Theileria, and Babesia. I will be curious to follow the authors' future work to investigate the function of this electropositive N-terminus, possibly by comparative and mutagenesis analysis. 

      We agree that further molecular studies of DNA-binding determinants by PfHO and its N-terminus will be insightful.

      The Quantitative RT-PCR analysis revealed that loss of PfHO specifically resulted in decreased apicoplast RNA. I wonder if the authors plan to conduct RNAseq analysis on the PfHO knockdown strain across multiple life stages, to get a clearer picture of PfHO function in malaria parasites. 

      Our RT-qPCR data across multiple asexual stages prior to organelle loss indicate that abundance of all apicoplast-encoded transcripts drops precipitously and uniformly upon PfHO knockdown (Fig. 5- figure supplement 7). Given the small size of the apicoplast genome and the polycistronic nature of apicoplast transcription, we assume that RNA-Seq studies would result in a similar observation. We hypothesize that PfHO knockdown and subsequent dysfunctions may interfere with RNA polymerase assembly on DNA and/or processivity. We are currently testing these hypotheses.

      I noticed that the authors did not discuss the function of PfHO in apicoplast organelle biogenesis. Since ClpM (previously termed ClpC) is the only apicoplast-encoded Clp subunit that is essential for apicoplast biogenesis, does the author think that PfHO knockdown parasites lost their apicoplast due to decreased ClpM expression? If that were the case, would episomally expression or nuclear knockin of ClpM rescue PfHO deficiency in the absence of isopentenyl pyrophosphate (IPP)? 

      We share the reviewer’s curiosity to understand how loss of apicoplast transcripts leads to organelle dysfunction and defective IPP synthesis. We agree that ClpM function may be critical to import of nuclear-encoded proteins necessary for apicoplast function. SufB encoded on the apicoplast genome is also expected to be essential for Fe-S cluster synthesis in the apicoplast and to be required for Fe-S-dependent IPP synthesis. We have expanded the first Discussion section to address these possible connections.

      Minor: 

      (1) None of the microscopy photos have scale bars. 

      We have added scale bars to all microscopy images.

      (2) Multiple microscopy pictures show strange patches around the fluorescent signals (a grey square distinguishes from the black background). This is especially evident in Figure 2 S2. Was it caused by the reduction of the original pictures? 

      We have reviewed all fluorescence microscopy images but are unable to identify the issue noted by the reviewer. We have uploaded new versions of all images to include scale bars (as requested above), and we hope that this update resolves the issue observed by the reviewer. We are happy to further troubleshoot and address if the reviewer continues to see these artifacts and can provide further information.

      (3) A description of how Southern blotting was performed is missing. 

      We thank the reviewer for bringing this omission to our attention. We have added a description of the Southern blot methods to the section on genome editing.

      (4) Figure 3B: should be "αGFP: 12nm", not "αPfHO1: 12nm". 

      We have modified this labeling to read “αGFP (PfHO): 12 nm”.

      (5) Figure 3C: which clone of PfHO knockdown was used in all the following figures? How many clones were tested in the following figures (did they show consistent phenotype)? 

      The polyclonal culture of PfHO-aptamer/TetR-DOZI knockdown parasites from transfection 11 was used for growth assay and western blot experiments, since there was no evidence by PCR or Southern blot for the wildtype PfHO locus. We have elaborated on these details in the Methods section.

      Reviewer #2 (Recommendations For The Authors): 

      In Figure 2 and Figure 3B, to address rigor and reproducibility, the authors should state the number of parasites analyzed and if there was any variation in localization. For instance, did all of the parasites analyzed have apicoplast localization of heme oxygenase or was there a distribution of apicoplast and non-apicoplast localization? 

      Localization by fluorescence microscopy of episomal and endogenous tagged PfHO is presented in Fig. 2, Fig. 2- fig. supplements 1 and 2, and Fig. 3- fig. supplement 2. Localization by immunogold EM is presented in Fig. 3B and Fig. 3- fig. supplement 3. In all cases 3-4 representative images are presented that support exclusive localization of PfHO to the apicoplast. We imaged ≥10-20 additional parasites in all cases (and across distinct transfections and biological samples) that also supported exclusive localization to the apicoplast. We have modified the figure legends and methods description to note these replicate values. Finally, we note that IPP rescue of parasite viability upon PfHO knockdown strongly supports the conclusion that the critical and essential function of PfHO impacts the apicoplast, consistent with its exclusive detection in that organelle by microscopy.

    1. eLife Assessment

      The authors present a critique of current usage of principal component analysis in geometric morphometrics, making a compelling case with benchmark data that standard techniques perform poorly. The work is an important contribution to the field and will hopefully lead to a reassessment of the methodology most scientists in morphometrics currently use. This work challenges a very commonly used analytical approach and is bound to raise some controversy in the community, but the authors' critique is based on a well-founded and well-thought out analysis.

    2. Reviewer #1 (Public review):

      Mohseni and Elhaik have critically examined the widespread use of principal component analysis (PCA) in phylogenetic inferences within the discipline of physical anthropology. The authors present compelling evidence that PCA underperforms compared to machine learning (ML) classifiers. This excellent work not only challenges the reliability of PCA-based taxonomic inferences, but also adds to a growing body of literature questioning the application of PCA in physical anthropology, thereby initiating a fruitful discussion in our field. Moreover, it underscores the crucial need of external validation methods in such studies.

      The authors have addressed nearly all of my comments, and my questions have been fully answered. The revised manuscript represents a significant improvement.

      The new title more effectively conveys the central message emerging from this research; The revised introduction more precisely addresses the methodological challenges currently facing the discipline.<br /> I am equally amazed by the profound susceptibility of the PCA results, as demonstrated by the alterations introduced by the authors, and by the contrasting robustness of the ML classifiers. I trust that this contrast will spark a fruitful discussion about the application of both methods in our field. It should also inspire further research conducted by physical anthropologists to study the role of ML in this discipline.<br /> Lastly, and importantly, I believe the authors should be commended for addressing the broader implications of their work, particularly in relation to public perceptions of science (pp. 20-21).

    3. Reviewer #3 (Public review):

      Mohseni and Elhaik challenge the widespread use of PCA as an analytical and interpretive tool in the study of geometric morphometrics. The standard approach in geometric morphometrics analysis involves Generalised Procrustes Analysis (GPA) followed by Principal Component Analysis (PCA). Recent research challenges PCA outcomes' accuracy, robustness, and reproducibility in morphometrics analysis. In this paper, the authors demonstrate that PCA is unreliable for such studies. Additionally, they test and compare several Machine-Learning methods and present MORPHIX, a Python package of their making that incorporates the tools necessary to perform morphometrics analysis using ML methods.

      Mohseni and Elhaik conducted a set of thorough investigations to test PCA's accuracy, robustness, and reproducibility following renewed recent criticism and publications where this method was abused. Using a set of 2 and 3D morphometric benchmark data, the authors performed a traditional analysis using GPA and PCA, followed by a reanalysis of the data using alternative classifiers and rigorous testing of the different outcomes.

      In the current paper, the authors evaluated eight ML methods and compared their classification accuracy to traditional PCA. Additionally, common occurrences in the attempted morphological classification of specimens, such as non-representative partial sampling, missing specimens, and missing landmarks, were simulated, and the performance of PCA vs ML methods was evaluated.

      Comments on revisions:

      I have gone over the revised manuscript and the detailed responses to the previous round of review. While there are places where I personally would have used slightly toned-down phrasing, the authors' get to set the tone of their manuscript, and I will not argue with that any further.

      In general, the restructuring, addition of new paragraphs, minor revisions and new title make for a much better manuscript, which as stated in the previous review, will be a valuable resource for workers in the field.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Comment 1. Mohseni and Elhaik's article offers a critical evaluation of Geometric Morphometrics (GM), a common tool in physical anthropology for studying morphological differences and making phylogenetic inferences. I read their article with great interest, although I am not a geneticist or an expert on PCA theory since the problem of morphology-based classification is at the core of paleoanthropology.

      The authors developed a Python package for processing superimposed landmark data with classifier and outlier detection methods, to evaluate the adequacy of the standard approach to shape analysis via modern GM. They call into question the accuracy, robustness, and reproducibility of GM, and demonstrate how PCA introduces statistical artefacts specific to the data, thus challenging its scientific rigor. The authors demonstrate the superiority of machine learning methods in classification and outlier detection tasks. The paper is well-written and provides strong evidence in support of the authors' argument. Thus, in my opinion, it constitutes a major contribution to the field of physical anthropology, as it provides a critical and necessary evaluation of what has become a basic tool for studying morphology, and of the assumptions allowing its application for phylogenetic inferences. Again, I am not an expert in these statistical methods, nor a geneticist, but the authors' contribution is of substantial relevance to our field (physical anthropology). The examples of NR fossils and HLD 6 are cases in point, in line with other notable examples of critical assessment of phylogenetic inferences made on the basis of PCA results of GM analysis. For example, see Lordkipanidze et al.'s (2014) GM analyses of the Dmanisi fossils, suggesting that the five crania represent a single regional variant of Homo erectus; and see Schwartz et al.'s (2014) comment on their findings, claiming that the dental, mandibular, and cranial morphology of these fossils suggest taxic diversity. Schwartz et al. (2014) ask, "Why did the GMA of 78 landmarks not capture the visually obvious differences between the Dmanisi crania and specimens commonly subsumed H. erectus? ... one wonders how phylogenetically reliable a method can be that does not reflect even easily visible gross morphological differences" (p. 360).

      As an alternative to the PCA step in GM, the authors tested eight leading supervised learning classifiers and outlier detection methods on three-dimensional datasets. The authors demonstrated inconsistency of PCA clustering with the taxonomy of the species investigated for the reconstruction of their phylogeny, by analyzing a database comprising landmarks of 6 known species that belong to the Old World monkeys tribe Papionini, using PCA for classification. The authors also demonstrated that high explained variance should not be used as an estimate of high accuracy (reliability). Then, the authors altered the dataset in several ways to simulate the characteristic nature of paleontological data.

      The authors excluded taxa from the database to study how PCA and alternative classifiers are affected by partial sampling, and the results presented in Figures 4 and 5, among others, are quite remarkable in showing the deviations from the benchmark data. These results expose the perils of applying PCA and GM for interpreting morphological data. Furthermore, they provide evidence showing that the alternative classifiers are superior to PCA, and that they are less susceptible to experimenter intervention. Similar results, i.e., inconsistencies in the PC plots, were obtained in examinations of the effect of removing specimens from the dataset and in the interesting test of removing landmarks to simulate partial morphological data, as is often the case with fossils. To test the combined effect of these data alterations, the authors combined removal of taxa, specific samples, and landmarks from the dataset. In this case, as well, the PCA results indicate deviation from the benchmark data. However, the ML classifiers could not remedy the situation. The authors discuss how these inconsistencies may lead to different interpretations of the data, and in turn, different phylogenetic conclusions. Lastly, the authors simulated the situation of a specimen of unknown taxonomy using outlier detection methods, demonstrating LOF's ability to identify a novelty in the morphospace.

      References

      Bookstein FL. 1991. Morphometric tools for landmark data: geometry and biology [Orange book]. Cambridge New York: Cambridge University Press.<br /> Cooke SB, and Terhune CE. 2015. Form, function, and geometric morphometrics. The Anatomical Records 298:5-28.<br /> Lordkipanidze D, et al. 2013. A complete skull from Dmanisi, Georgia, and the evolutionary biology of early Homo. Science 342: 326-331.<br /> Schwartz JH, Tattersall I, and Chi Z. 2014. Comment on "A complete skull from Dmanisi, Georgia, and the evolutionary biology of Early Homo". Science 344(6182): 360-a.

      The reviewer considered our work to be a “contribution is of substantial relevance to our field (physical anthropology)” We are grateful for this evaluation and for the thorough review and insightful comments on our manuscript, which helped us improve its quality further. Your remarks regarding the superiority of machine learning methods over traditional GM approaches, as well as the challenges and implications highlighted in our findings, resonate deeply with the core objectives of our research. The references to previous studies and their relevance to our work underscore the broader implications of our findings for the interpretation of morphological data in evolutionary studies. We are thankful for your remarks regarding the debate surrounding the Dmanisi fossils. We covered it in our introduction (lines 161-174):

      Finally, PCA also played a part in the much-disputed case of the Dmanisi hominins (39, 40). These early Pleistocene hominins, whose fossils were recovered at Dmanisi (Georgia), have been a subject of intense study and debate within physical anthropology. Despite their small brain size and primitive skeletal architecture, the Dmanisi fossils represent Eurasia’s earliest well-dated hominin fossils, offering insights into early hominin migrations out of Africa. The taxonomic status of the Dmanisi hominins has been initially classified as Homo erectus or potentially represented a new species, Homo georgicus or else (40, 41). Lordkipanidze et al.’s (42) geometric morphometrics analyses suggested that the variation observed among the Dmanisi skulls may represent a single regional variant of Homo erectus. However, Schwartz et al. (2014) (43) raised concerns about the phylogenetic inferences based on PCA results of the geometric morphometrics analysis, noting the failure of the method to capture visually obvious differences between the Dmanisi crania and specimens commonly subsumed under Homo erectus."

      Comment 2. I suggest moving all the interpretations from the Results section to the Discussion section. This will enhance the flow of the results and make it easier to follow.

      We tried that, but it made the manuscript less readable. Because our manuscript makes two strong statements, one about the unsuitability of PCA to the field and one about the many other problems in the field, as demonstrated through several test cases, it is better to keep them separate in the Results and Discussions, respectively.

      Comment 3. I recommend conducting an English language edit on the text to address minor inconsistencies.

      We thoroughly edited the text to enhance the language style and consistency. We thank the reviewer for the suggestion.

      Comment 4. Line 21, what do you mean by "ontogenists"?

      Individuals who are versed in or study ontogeny.

      Comment 5. When referring to the remains from Nesher Ramla (Israel), I recommend using "NR fossils". Thus, in line 34, I suggest replacing "Homo Nesher Ramla" by "Nesher Ramla fossils (NR fossils)", also in line 122.

      We replaced "Homo Nesher Ramla" with "Nesher Ramla fossils (NR fossils)" in all of the instances throughout the manuscript. We thank the reviewer for the suggestion.

      Comment 6. Line 34, I suggest replacing "human" by "hominin".

      (Line 35) We replaced "human" with "hominin".

      “…, such as the case of Homo Nesher Ramla, an archaic hominin with a questionable taxonomy.”

      We thank the reviewer for the suggestion.

      Comment 7. Line 67-68, I suggest clarifying the classification of landmarks using the definition of landmark types (Bookstein, 1991; also see summary by Cooke and Terhune (2015) - Table 1).

      We revised our summary of the classification of landmarks: (Lines 83-94). Our MS now reads:

      “Determining sufficient measurements and data points for a valid morphometric analysis is older than modern geometric morphometrics (19). In geometric morphometrics, landmarks are discrete points on biological structures used to capture shape variation. Bookstein (20) categorised landmarks into three types: Type one, representing the juxtaposition of tissues such as the intersection of two sutures; Type two, denoting maxima of curvature like the deepest point in a depression or the most projecting point on a process; and Type three, which includes extremal points defined by information from other locations on the object, such as the endpoint or centroid of a curve or feature. Originally, Type three landmarks encompassed semi-landmarks, but Weber and Bookstein (21) refined this classification, identifying Type three landmarks as those characterised by information from multiple curves and symmetry, including the intersection of two curves or the intersection of a curve and a suture, and further subdividing them into three subtypes (3a, 3b, 3c) (15). While landmarks provide crucial information about the structure’s overall shape, semi-landmarks capture fine-scale shape variation (e.g., curves or surfaces) that landmarks alone cannot adequately represent. Semi-landmarks are heavily relied upon as the source of shape information to break the continuity of regions in the specimen without clearly identifiable landmarks (22). Semi-landmarks are typically aligned based on their relative positions to landmarks, allowing for the comprehensive analysis of shape changes and deformations within complex structures (2). Unsurprisingly, the use of semi-landmarks is controversial. For instance, Bardua et al. (23) claim that high-density sliding semi-landmark approaches offer advantages compared to landmark-only studies, while Cardini (24) advises caution about potential biases and subsequent inaccuracies in high-density morphometric analyses.”

      We thank the reviewer for the suggestion.

      Comment 8. Line 84, "beneficial over" - I suggest revising.

      (Line 102) We revised the sentence and used “offer advantages” instead.

      “… claim that high-density sliding semi-landmark approaches offer advantages compared to landmark-only studies.”

      We thank the reviewer for the suggestion.

      Comment 9. Line 97, do you mean "therefore"?

      (Line 115) Yes, we replaced "thereby" with "therefore".

      Comment 10. Line 116, I suggest rephrasing as follows: "newly discovered hominin fossils with respect to...".

      (Lines 135, 136) We rephrased it as suggested:

      “is the classification of newly discovered hominin fossils within the human phylogenetic tree”

      We thank the reviewer for the suggestion.

      Comment 11. Line 119, please clarify or explain what you mean by subjective determination of clustering in PCA plots.

      We rephrased (Lines 137, 138) to read:

      "However, which specimens should be included in clusters and which ones should be considered outliers is determined subjectively…"

      We thank the reviewer for the suggestion.

      Comment 12. Lines 146-148: consider revising to clarify the sentence; "than" in line 147 should be "that".

      We modified the sentence, we replaced "than" with "that". (Lines 196, 197)

      " … that even the criticism from its pioneers was dismissed"

      We thank the reviewer for the suggestion.

      Comment 13. Line 213: I recommend adding the phylogenetic tree of the Papionini tribe. This would be particularly relevant for the interpretation of the results, e.g., in lines 324-328.

      The reviewer suggested adding a phylogenetic tree of the Papionini tribe to increase the interpretability of our results. We added two trees (Figure 3) based on the molecular phylogeny of extant papionins and the most parsimonious tree generated from the initial Collard and Wood (1).

      We thank the reviewer for the suggestion.

      Comment 14. Lines 244-248: I recommend that the parallels drawn between the results presented in this section and other cases of PCA analysis interpretation (e.g., the NR fossils) are transferred to the Discussion section.

      This would allow a more fluent read of the results.

      Thank you, we considered that but found that it does not improve the readability of the discussion, because this is a very technical issue that would be best understood alongside the specific use case that tests it.

      Comment 15. Line 301: The word "are" should be placed before the word "all".

      (Line 319) We modified accordingly and placed "are" before "all":

      “Rarely are all related taxa represented;”

      We thank the reviewer for the suggestion.

      Comment 16. Line 426: I suggest "omissions" in place of "missingness".

      (Line 435) We replaced "missingness" with "omissions".

      We thank the reviewer for the suggestion.

      Comment 17. Line 440 is part of the caption for Figure 6. Please add a description of what the red arrow indicates in every figure in which it appears.

      Yes, we added a sentence to the caption of figures 7 and 8:

      “The red arrow in subfigures A, B, and C marks a Lophocebus albigena (pink) sample whose position in PC scatterplots is of interest.”

      We thank the reviewer for the suggestion.

      Comment 18. Line 454: I recommend "partial morphological information" instead of "some form information".

      (Lines 446, 447) We made modifications and replaced "some form information" with " partial morphological information":

      “Newfound samples often comprise incomplete osteological remains or fossils (18, 22) and only present partial morphological information.”

      We thank the reviewer for the suggestion.

      Comment 19. Line 547: I suggest "portion" instead of "fracture".

      (Lines 470, 471) We replaced "fracture" with "portion":

      “Thereby, while the complete skull would cluster with its own taxon…”

      We thank the reviewer for the suggestion.

      Comment 20. Lines 664-665 should read "anatomy and physical anthropology".

      (Lines 600-602) We modified the text accordingly:

      “There are various approaches in morphometrics, but among them, geometric morphometrics has left an indelible mark on biology, especially in anatomy and physical anthropology.”

      We thank the reviewer for the suggestion.

      Comment 21. Lines 684-699: This paragraph seems to belong in the introduction section.

      (lines 175-190) We modified it and moved it to the introduction.

      “Visual interpretations of the PC scatterplots are not the only role PCA plays in geometric morphometrics. Phylogenetic Principal Component Analysis (Phy-PCA) (44) and Phylogenetically Aligned Component Analysis (PACA) (45) are both used in geometric morphometrics to analyse shape variation while considering the supposed phylogenetic relationships among species. They differ in their approach to aligning landmark configurations and the role of PCA within them. Phy-PCA incorporates phylogenetic information by utilising a phylogenetic tree to model the evolutionary history of the species. This method aims to separate shape variation resulting from shared evolutionary history from other sources of variation. PCA plays a similar role in performing dimensionality reduction on the aligned landmark configurations in Phy-PCA (44). PACA takes a different approach to alignment. It uses a Procrustes superimposition method based on a phylogenetic distance matrix, aligning the landmark configurations according to the evolutionary relationships among species. PCA is then applied to the aligned configurations to extract the principal components of shape variation (45). Both analyses provide insights into the patterns and processes that shape biological form diversity while considering phylogenetic relationships, yet they are also subjected to the limitations and biases inherent in relying on PCA as part of the process.”

      We thank the reviewer for the suggestion.

      Comment 22. Line 717: I suggest "fossils" instead of "hominins".

      (Lines 636, 637) We modified it accordingly and replaced "hominins" with "fossils":

      “…which reflect the restraints faced in morphometric analysis of ancient samples (e.g., fossils).”

      We thank the reviewer for the suggestion.

      Comment 23. Line 728: the word "the" should be deleted; Skhul V should not be italicized, and so do the words "Mount Carmel"; "Neandertals"; "modern humans"; and "Late Paleolithic" in the following lines.

      (Line 647-651) We made modifications accordingly:

      “For example, Harvati (27), who analysed the Skhul 5 (84), a 40,000-year-old human skull from Mount Carmel (Israel), proposed diverging hypotheses based on favourable PC outcomes (based on PC8 separating it from Neanderthals and modern humans and associating it with the Late Palaeolithic specimen and based on PC12 associating it with modern humans).”

      We thank the reviewer for the suggestion.

      Comment 24. Line 734: the first comma should be deleted.

      (Line 653) We deleted the first comma:

      “(Figures 5-12) show that compared to the benchmark (Figure 4), …”

      We thank the reviewer for the suggestion.

      Reviewer #2:

      Comment 1. I completely agree with the basic thrust of this study. Yes, of course, machine learning is FAR better than any variant of PCA for the paleosciences. I agree with the authors' critique early on that this point is not new per se - it is familiar to most of the founders of the field of GMM, including this reviewer. A crucial aspect is the dependence of ALL of GMM, PCA or otherwise, on the completely unexamined, unformalized praxis by which a landmark configuration is designed in the first place. I must admit that I am stunned by the authors' estimate of over 32K papers that have used PCA with GMM.

      We thank the reviewer for accepting the premise of our study.

      But beating a dead horse is not a good way of designing a motor vehicle. I think the manuscript needs to begin with a higher-level view of the pathology of its target disciplines, paleontology and paleoanthropology, along the lines that David demonstrated for numerical taxonomy some decades ago. That many thousands of bad methodologies require some sort of explanation all of their own in terms of (a) the fears of biologists about advanced mathematics, (b) the need for publications and tenure, (c) the desirability of covers of Nature and Science, and (d) the even greater glory of getting to name a new "species." This cumulative pathology of science results in paleoanthro turning into a branch of the humanities, where no single conclusion is treated as stable beyond the next dig, the next year or so of applied genomics, and the next chemical trace analysis. In short, the field is not cumulative.

      Given the wide popularity of PCA and the attempts to prevent data replication to show its limitations, we do not believe that we are beating a dead horse, but a very live beast that threatens the integrity of the entire field. We accept the second part of the analogy about developing a motor vehicle.

      We also accepted the reviewer’s suggestion and developed the suggested paragraph:

      " A major contribution to the field was made by Sokal and Sneath’s Principles of Numerical Taxonomy (9) book, which challenged traditional taxonomic theory as inherently circular and introduced quantitative methods to address questions of classification (see also review by Sneath (10)). Hull (11) claimed that evolutionary reasoning practiced in taxonomy is not inherently circular but rather unwarranted. He argued that such criticism was based on misunderstandings of the logic of hypothesising, which he attributed to an unrealistic desire for a mistake-proof science. He contended that scientific hypotheses should begin with insufficient evidence and be refined iteratively as new evidence emerges. However, some taxonomists preferred a more rigid, hierarchical approach to avoid the appearance of error. As a result of these and other criticisms, traditional taxonomy declined in favour of cladistics and molecular systematics, which provided more accurate and evolutionarily informed classifications.

      Today, palaeontology and palaeoanthropology grapple with methodological challenges that compromise the stability of their conclusions. These issues stem from various factors, including biologists’ apprehensions towards advanced mathematics, the pressure to publish for career advancement (12), the pursuit of high-profile journal covers, and the prestige associated with naming new species. As a result, these fields often resemble a branch of biology where the latest discoveries or new analytical techniques frequently overturn previous findings. This lack of cumulative knowledge necessitates a more rigorous approach to methodology and interpretation in morphometrics to ensure that conclusions are robust and enduring."

      It is not obvious that the authors' suggestion of supervised machine learning will remedy this situation, since (a) that field itself is undergoing massive changes month by month with the advent of applications AI, and even more relevant (b) the best ML algorithms, those based on deep neural nets, are (literally) unpublishable - we cannot see how their decisions have actually been computed. Instead, to stabilize, the field will need to figure out how to base its inferences on some syntheses of actual empirical theories.

      We appreciate the reviewer’s insightful comments and concerns regarding the use of supervised machine learning in our study. We acknowledge the rapid advancements in the field of machine learning and its significant impact on various domains, including geometric morphometrics. Although we are aware of the ongoing integration of machine learning techniques in geometric morphometrics, our objective was to thoroughly investigate some of the conventional and more frequently used models for comparative analysis.

      Our intention was also to develop a Python module that enables users to easily apply these models to their landmark data. We recognise that most users typically apply machine learning methods to the principal component analysis (PCA) of their landmark data (2), unless PCA fails to explain enough variance (3), as we discussed in the context of Linear Discriminant Analysis (LDA). Our study demonstrates that these machine learning methods can be directly applied after generalised Procrustes analysis (GPA), without necessitating PCA as an intermediary step. This highlights another significant point of our research: the often automatic and potentially unnecessary use of PCA in geometric morphometrics.

      Furthermore, we acknowledge that the availability of more extensive data might have allowed us to explore more complex methods, such as neural networks. However, neural networks require a substantial amount of data due to their numerous learning parameters, which we did not possess in this study. It is also evident that not every algorithm is suitable for every situation. Our findings revealed that simpler models, such as the nearest neighbours classifier, which do not even have a training phase, performed exceptionally well. Additionally, the nearest neighbours classifier offers the desired transparency and interpretability, addressing the reviewer’s concern regarding the opacity of more complex models.

      We hope this clarifies our approach and objectives, and we sincerely thank the reviewer for their valuable feedback, which has helped us refine our study and its presentation.

      It's not that this reviewer is cynical, but it is fair to suggest a revision conveying a concern for the truly striking lack of organized skepticism in the literature that is being critiqued here. A revision along those lines would serve as a flagship example of exactly the deeper argument that reference (17) was trying to seed, that the applied literature obviously needs a hundred times more of. Such a review would do the most good if it appeared in one of the same journals - AJBA, Evolution, Journal of Human Evolution, Paleobiology - where the bulk of the most highly cited misuses of PCA themselves have appeared.

      First, we do not believe that this reviewer is cynical, and we hope they will not consider us cynical if we point out that the field has thus far largely ignored previous reports of PCA misuses published in those journals, like the excellent Bookstein 2019 (4) paper, so perhaps a different approach is needed with a different journal.

      Second, our MS is not a review. We agree with the reviewer that a review of PCA critical papers is of value. We changed the title of our study to make it easier to find, and we thank the reviewer for the comment. 

      Reviewer #3:

      Comment 1. Mohseni and Elhaik challenge the widespread use of PCA as an analytical and interpretive tool in the study of geometric morphometrics. The standard approach in geometric morphometrics analysis involves Generalised Procrustes Analysis (GPA) followed by Principal Component Analysis (PCA). Recent research challenges PCA outcomes' accuracy, robustness, and reproducibility in morphometrics analysis. In this paper, the authors demonstrate that PCA is unreliable for such studies. Additionally, they test and compare several Machine-Learning methods and present MORPHIX, a Python package of their making that incorporates the tools necessary to perform morphometrics analysis using ML methods.

      Mohseni and Elhaik conducted a set of thorough investigations to test PCA's accuracy, robustness, and reproducibility following renewed recent criticism and publications where this method was abused. Using a set of 2 and 3D morphometric benchmark data, the authors performed a traditional analysis using GPA and PCA, followed by a reanalysis of the data using alternative classifiers and rigorous testing of the different outcomes.

      In the current paper, the authors evaluated eight ML methods and compared their classification accuracy to traditional PCA. Additionally, common occurrences in the attempted morphological classification of specimens, such as non-representative partial sampling, missing specimens, and missing landmarks, were simulated, and the performance of PCA vs ML methods was evaluated.

      This is a correct description of our MS.

      The main problem with this manuscript is that it is three papers rolled into one, and the link doesn't work.

      We agree that the manuscript is comprehensive and can probably be broken down into more than one manuscript. However, we do not adhere to the philosophies of the least publishable unit (LPU), the smallest publishable unit (SPU), or the minimum publishable unit (MPU). Instead, we believe in producing high-quality and encompassing studies.

      We checked the link thoroughly and ensured it is functional, thank you for your comment.

      The title promises a new Python package, but the actual text of the manuscript spends relatively little time on the Python package itself and barely gives any information about the package and what it includes or its usefulness. It is definitely not the focus of the manuscript. The main thrust of the manuscript, which takes up most of the text, is the analysis of the papionin dataset, which shows very convincingly that PCA underperforms in virtually all conditions tested.

      We agree. We revised the title to reflect the main issue of the paper. Thank you for your comment.

      In addition, the manuscript includes a rather vicious attack against two specific cases of misuse of PCA in paleoanthropological studies, which does not connect with the rest of the manuscript at all.

      We consider these case studies of the use of PCA, which resonate with our ultimate goal. First, the previous reviewer suggested that we are beating a “dead horse.” We provide very recent and high-profile test cases to support our position that PCA is a popular and widely used method. Second, we wish to show how researchers use data alternations to cherry-pick results. Third, we focus on one of the use cases (the Homo NS) to demonstrate the poor scientific practices prevalent in this field, such as refusing to share data and breaking Science’s policies to protect this act.

      If the manuscript is a criticism of PCA techniques, this should be reflected in the title. If it is a report of a new Python package, it should focus on the package. Otherwise, there should be two separate manuscripts here.

      It is a criticism of PCA, and it is now reflected in the title; thank you again.

      The criticism of PCA is valid and important. However, pointing out that it is problematic in specific cases and is sometimes misused does not justify labeling tens of thousands of papers as questionable and does not justify vilifying an entire discipline. The authors do not make a convincing enough case that their criticism of the use of PCA in analyzing primate or hominin skulls is relevant to all its myriad uses in morphometrics. The criticism is largely based on statistical power, but it is framed as though it is a criticism of geometric morphometrics in general.

      We appreciate the opportunity to address the concerns raised regarding our critique of PCA. The reviewer argues that because we analyzed only primate skulls, we cannot extrapolate that PCA will be biased in analyzing other data (other taxa or other usages). Using the same logic, we can also argue that PCA cannot be used to study NEW taxa and certainly not to detect NOVEL taxa because it was never shown to apply to these taxa. We can further argue that PCA cannot be sued to study ANY taxa since it was never shown to yield correct results (PCA results are justified through circular reasoning and are adjusted when they do not show the desired results). However, that part of our answer is not a defense of our method but rather a further criticism of the field.

      To answer the question more directly, our criticism of PCA is rooted in empirical evidence and robust research, including studies by Elhaik (5) and others (6, 7), demonstrating that PCA lacks the power to produce accurate and reliable results. If the reviewer believes that using cats instead of primates will somehow boost the accuracy of PCA, they should, at the very least, explain what morphological properties of cats justify this presumption. Concerning the case of other usages, we clearly noted that “the scope of our study was limited to PCA usage in geometric morphology.”  The reviewer did not explain why our analysis is not “convincing enough,” so we cannot address it.

      As you know, this issue extends beyond the specific case study of primate or hominin skulls in our research. Despite its widespread use, PCA is heavily relied upon in the field, often without sufficient scrutiny of its limitations. Our intention is not to vilify an entire discipline but to highlight the pervasive and sometimes unquestioning reliance on PCA across many studies in geometric morphometrics. Calling to reevaluate studies based on problematic method is not a vilification, this is by definition science.

      While we understand the concern about the generalisability of our findings, our critique is based on the inherent limitations of PCA itself, not merely on statistical power. PCA lacks measurable power, a test of significance, and a null model. Its outcomes are highly sensitive to the input data, making them susceptible to manipulation and interpretation. Moreover, the ability to evaluate various dimensions allows for cherry-picking of results, where different outcomes can be equally acceptable, thus undermining the robustness of conclusions drawn from PCA.

      We invite the reviewer to examine the mathematical basis of PCA as demonstrated in Figure 1 of Elhaik (2022) (https://www.nature.com/articles/s41598-022-14395-4/figures/1). We ask the reviewer to explain what in this straightforward calculation—calculating the mean of the dimensions, subtracting the mean from the dimensions, calculating the covariance matrix, and identifying the eigenvalues—convinces them that PCA is suitable for predicting evolutionary relationships between samples. What evidence supports the notion that evolutionary relationships can be inferred by merely subtracting the mean of a matrix? There is none, just as there is no statistical power in this method. PCA does not know what the data mean. It can be applied equally to horse race data and a dataset that records how many times Home Simpsons says his catchphrases. PCA is not an evolutionary method; it’s just a linear transformation. If we ask anyone why they trust it, eventually, we will get the answer that with enough tweaking, PCA results produce what the scientist wants to show, and, most importantly, it will be mathematically accurate (and as mathematically accurate as the result of all possible tweaks). There is nothing specific to hominins about it. If your method produces conflicting results by tweaking the number of samples, species, or landmarks, as we showed, your method is worthless. This is what we demonstrated.

      We would also like to note that if we had easier access to more data, we would have extended our analysis further and shown that the bias exists in other species. As explained in our manuscript, we reached out to several scientists who refused to share their data so that we would not show biases in their studies. As this reviewer is undoubtedly aware of the practices in the field, this criticism is extremely unfair.

      Finally, arguing that our MS dismisses the entire field of geometric morphometrics is also unfair and provocative. We made no such claim. On the contrary, we offer an unbiased method to replace PCA and improve the accuracy of studies in this field.

      We hope this clarifies our position and reinforces the validity of our critique. Thank you for your valuable feedback and for allowing us to address these important points.

      Comment 2a. The article's tone is very argumentative and provocative, and non-necessary superlatives and modifiers are used ("...colourful scatterplots", lines 101, 155, 672). While this is an excellent paper and should be studied by morphometrics experts and probably anyone using PCA, the overall tone does nothing to help. It reads somewhat like a Facebook rant rather than a scientific paper (there is still, we hope, a difference between the two). Please tone it down.

      Again, we thank the reviewer for considering our work excellent. We regret that the reviewer believes that describing colorful (#101) scatterplots as such is a provocation. We do not feel the same way. “Subsumed” (#155) has been suggested to us by an anonymous reviewer. We changed it to “classified” to satisfy the reviewer (However, Schwartz et al. (2014) raised concerns about the phylogenetic inferences based on PCA results of the geometric morphometrics analysis, noting the failure of the method to capture visually obvious differences between the Dmanisi crania and specimens commonly classified under Homo erectus.).  We do not understand the problem with #672, but we revised it to read “However, a growing body of literature criticises the accuracy of various PCA applications, raising concerns about its use in geometric morphometrics.” We hope that this satisfies the reviewer. We made no special effort to be argumentative or provocative. There is no need for that; our results speak for themselves. We did, however, make an effort to communicate the gravity of our findings by citing K. Popper. We do not consider this a provocation.

      Comment 2b. The acronym ML is normally used to denote Maximum Likelihood in the context of phylogenetic studies. The authors use it to denote Machine Learning, which many readers may find confusing (this reviewer took a while to realize that it was not referring to Maximum Likelihood). Perhaps leave "machine learning" written in full.

      We understand that in some contexts, "ML" typically denotes Maximum Likelihood, which can indeed cause confusion. Unfortunately, “ML” is also a well-established acronym for machine learning, and since our paper doesn’t deal with Maximum Likelihood but rather machine learning, we have to choose the latter. Initially, we did spell out "Machine Learning" in full to avoid this confusion. However, upon review, we found that the manuscript's readability and flow were compromised, leading us to revert to the acronym.

      We appreciate your suggestion and understand the importance of clarity. To address this, we will ensure that the first mention of "ML" is accompanied by "Machine Learning" written in full (Line 244). This should help maintain both clarity and readability. Thank you for your valuable input.

      Comment 3. In lines 142, 157 Rohlf's should be Rohlf.

      (Lines 191, 205) We modified it accordingly and replaced "Rohlf's" with "Rohlf".

      Comment 4. The short paragraph in lines 165-167 feels out of place and does not connect to the paragraphs before and after it.

      (Lines 210-223) We modified the introduction and merged that paragraph with a relevant paragraph. The new paragraph reads:

      “PCA’s prominent role in morphometrics analyses and, more generally, physical anthropology is inconsistent with the recent criticisms, raising concerns regarding its validity and, consequently, the value of the results reported in the literature. To assess PCA’s accuracy, robustness, and reproducibility in geometric morphometric analysis, particularly its potential biases and inconsistencies in clustering with species taxonomy for phylogenetic reconstruction, we utilised a benchmark database containing landmarks from six known species within the Old World monkeys tribe Papionini. We altered this dataset to simulate typical characteristics of paleontological data. We found that PCA’s outcomes lack reliability, robustness, and reproducibility. We also evaluated the argument that a high explained variance could be counted as a measure of reliability (2) and found no association between high explained variance amounts and the subjectiveness of the results. If PCA of morphometric landmark data produces biased results, then landmark-based geometric morphometric studies employing PCA, conservatively estimated to range jfrom 18,400 to 35,200 (as of July 2024) (see Methods), should be reevaluated.”

      We thank the reviewer for the suggestion.

      References

      (1) Gilbert CC, Rossie JB. Congruence of molecules and morphology using a narrow allometric approach. Proceedings of the National Academy of Sciences. 2007;104(29):11910-11914.

      (2) Courtenay LA, Yravedra J, Huguet R, Aramendi J, Maté-González MÁ, González-Aguilera D, et al. Combining machine learning algorithms and geometric morphometrics: a study of carnivore tooth marks. Palaeogeography, Palaeoclimatology, Palaeoecology. 2019;522:28-39.

      (3) Bellin N, Calzolari M, Callegari E, Bonilauri P, Grisendi A, Dottori M, et al. Geometric morphometrics and machine learning as tools for the identification of sibling mosquito species of the Maculipennis complex (Anopheles). Infection, Genetics and Evolution. 2021;95:105034.

      (4) Bookstein FL. Pathologies of between-groups principal components analysis in geometric morphometrics. Evolutionary Biology. 2019;46(4):271-302.

      (5) Elhaik E. Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated. Scientific reports. 2022;12(1):1-35.

      (6) Cardini A, Polly PD. Cross-validated between group PCA scatterplots: a solution to spurious group separation? Evolutionary Biology. 2020;47(1):85-95.

      (7) Berner D. Size correction in biology: how reliable are approaches based on (common) principal component analysis? Oecologia. 2011;166(4):961-971.

    1. eLife Assessment

      The authors examined the evolution of hepatitis C virus (HCV) in a cohort of 14 subjects with recent HCV infections. By using computational methods, they showed that viral fitness declines as the virus mutates to escape the immune response and can rebound later in infection as HCV accumulates additional mutations. The study contributes to an important aspect of viral evolution. The combination of approaches is highly compelling; however, some aspects of the manuscript are incomplete and would greatly benefit from additional revision, mainly to increase their clarity.

    2. Reviewer #1 (Public review):

      Summary:

      The authors examine CD8 T cell selective pressure in early HCV infection using. They propose that after initial CD8-T mediated loss of virus fitness, in some participants around 3 months after infection, HCV acquires compensatory mutations and improved fitness leading to virus progression.

      Strengths:

      Throughout the paper, the authors apply well-established approaches in studies of acute to chronic HIV infection for studies of HCV infection. This lends rigor the to the authors' work.

      Weaknesses:

      (1) The Discussion could be strengthened by a direct discussion of the parallels/differences in results between HIV and HCV infections in terms of T cell selection, entropy, and fitness.

      (2) In the Results, please describe the Barton model functionality and why the fitness landscape model was most applicable for studies of HCV viral diversity.

      (3) Recognize the caveats of the HCV mapping data presented.

      (4) The authors should provide more data or cite publications to support the authors' statement that HCV-specific CD8 T cell responses decline following infection.

      (5) Similarly, as the authors' measurements of HCV T and humoral responses were not exhaustive, the text describing the decline of T cells with the onset of humoral immunity needs caveats or more rigorous discussion with citations (Discussion lines 319-321).

      (6) What role does antigen drive play in these data -for both T can and antibody induction?

      (7) Figure 3 - are the X and Y axes wrongly labelled? The Divergent ranges of population fitness do not make sense.

      (8) Figure S3 - is the green line, average virus fitness?

      (9) Use the term antibody epitopes, not B cell epitopes.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, Walker and collaborators study the evolution of hepatitis C virus (HCV) in a cohort of 14 subjects with recent HCV infections. They focus in particular on the interplay between HCV and the immune system, including the accumulation of mutations in CD8+ T cell epitopes to evade immunity. Using a computational method to estimate the fitness effects of HCV mutations, they find that viral fitness declines as the virus mutates to escape T-cell responses. In long-term infections, they found that viral fitness can rebound later in infection as HCV accumulates additional mutations.

      Strengths:

      This work is especially interesting for several reasons. Individuals who developed chronic infections were followed over fairly long times and, in most cases, samples of the viral population were obtained frequently. At the same time, the authors also measured CD8+ T cell and antibody responses to infection. The analysis of HCV evolution focused not only on variation within particular CD8+ T cell epitopes but also on the surrounding proteins. Overall, this work is notable for integrating information about HCV sequence evolution, host immune responses, and computational metrics of fitness and sequence variation. The evidence presented by the authors supports the main conclusions of the paper described above.

      Weaknesses:

      One notable weakness of the present version of the manuscript is a lack of clarity in the description of the method of fitness estimation. In the previous studies of HIV and HCV cited by the authors, fitness models were derived by fitting the model (equation between lines 435 and 436) to viral sequence data collected from many different individuals. In the section "Estimating survival fitness of viral variants," it is not entirely clear if Walker and collaborators have used the same approach (i.e., fitting the model to viral sequences from many individuals), or whether they have used the sequence data from each individual to produce models that are specific to each subject. If it is the former, then the authors should describe where these sequences were obtained and the statistics of the data.

      If the fitness models were inferred based on the data from each subject, then more explanation is needed. In prior work, the use of these models to estimate fitness was justified by arguing that sequence variants common to many individuals are likely to be well-tolerated by the virus, while ones that are rare are likely to have high fitness costs. This justification is less clear for sequence variation within a single individual, where the viral population has had much less time to "explore" the sequence landscape. Nonetheless, there is precedent for this kind of analysis (see, e.g., Asti et al., PLoS Comput Biol 2016). If the authors took this approach, then this point should be discussed clearly and contrasted with the prior HIV and HCV studies.

      Another important point for clarification is the definition of fitness. In the abstract, the authors note that multiple studies have shown that viral escape variants can have reduced fitness, "diminishing the survival of the viral strain within the host, and the capacity of the variant to survive future transmission events." It would be helpful to distinguish between this notion of fitness, which has sometimes been referred to as "intrinsic fitness," and a definition of fitness that describes the success of different viral strains within a particular individual, including the potential benefits of immune escape. In many cases, escape variants displace variants without escape mutations, showing that their ability to survive and replicate within a specific host is actually improved relative to variants without escape mutations. However, escape mutations may harm the virus's ability to replicate in other contexts. Given the major role that fitness plays in this paper, it would be helpful for readers to clearly discuss how fitness is defined and to distinguish between fitness within and between hosts (potentially also mentioning relevant concepts such as "transmission fitness," i.e., the relative ability of a particular variant to establish new infections).

      One concern about the analysis is in the test of Shannon entropy as a way to quantify the rate of escape. The authors describe computing the entropy at multiple time points preceding the time when escape mutations were observed to fix in a particular epitope. Which entropy values were used to compare with the escape rate? If just the time point directly preceding the fixation of escape mutations, could escape mutations have already been present in the population at that time, increasing the entropy and thus drawing an association with the rate of escape? It would also be helpful for readers to include a definition of entropy in the methods, in addition to a reference to prior work. For example, it is not clear what is being averaged when "average SE" is described.

    1. eLife Assessment

      This manuscript offers an exploration of the immune cells in the oyster Crassostrea gigas, by correlating distinct hemocyte morphotypes with specific single-cell transcriptional profiles. The evidence supporting the conclusion is convincing, deriving from the comprehensive dataset that not only captures unicellular diversity but also associates these cells with distinct immune roles, making it an important resource for the broader research community. There are some concerns on the data presentation that leave some questions.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, De La Forest Divonne et al. build a repertory of hemocytes from adult Pacific oysters combining scRNAseq data with cytologic and biochemical analyses. Three categories of hemocytes were described previously in this species (i.e. blast, hyalinocyte, and granulocytes). Based on scRNAseq data, the authors identified 7 hemocyte clusters presenting distinct transcriptional signatures. Using Kegg pathway enrichment and RBGOA, the authors determined the main molecular features of the clusters. In parallel, using cytologic markers, the authors classified 7 populations of hemocytes (i.e. ML, H, BBL, ABL, SGC, BGC, and VC) presenting distinct sizes, nucleus sizes, acidophilic/basophilic, presence of pseudopods, cytoplasm/nucleus ratio and presence of granules. Then, the authors compared the phenotypic features with potential transcriptional signatures seen in the scRNAseq. The hemocytes were separated in a density gradient to enrich for specific subpopulations. The cell composition of each cell fraction was determined using cytologic markers and the cell fractions were analysed by quantitative PCR targeting major cluster markers (two per cluster). With this approach, the authors could assign cluster 7 to VC, cluster 2 to H, and cluster 3 to SGC. The other clusters did not show a clear association with this experimental approach. Using phagocytic assays, ROS, and copper monitoring, the authors showed that ML and SGC are phagocytic, ML produces ROS, and SGC and BGC accumulate copper. Then with the density gradient/qPCR approach, the authors identified the populations expressing anti-microbial peptides (ABL, BBL, and H). At last, the authors used Monocle to predict differentiation trajectories for each subgroup of hemocytes using cluster 4 as the progenitor subpopulation.

      The manuscript provides a comprehensive characterisation of the diversity of circulating immune cells found in Pacific oysters.

      Strengths:

      The combination of the two approaches offers a more integrative view.

      Hemocytes represent a very plastic cell population that has key roles in homeostatic and challenged conditions. Grasping the molecular features of these cells at the single-cell level will help understand their biology.

      This type of study may help elucidate the diversification of immune cells in comparative studies and evolutionary immunology.

      Weaknesses:

      The study should be more cautious about the conclusions, include further analyses, and inscribe the work in a more general framework.

    3. Reviewer #2 (Public review):

      Summary:

      This work provides a comprehensive understanding of cellular immunity in bivalves. To precisely describe the hemocytes of the oyster C. gigas, the authors morphologically characterized seven distinct cell groups, which they then correlated with single-cell RNA sequencing analysis, also resulting in seven transcriptional profiles. They employed multiple strategies to establish relationships between each morphotype and the scRNAseq profile. The authors correlated the presence of marker genes from each cluster identified in scRNAseq with hemolymph fractions enriched for different hemocyte morphotypes. This approach allowed them to correlate three of the seven cell types, namely hyalinocytes (H), small granule cells (SGC), and vesicular cells (VC). A macrophage-like (ML) cell type was correlated through the expression of macrophage-specific genes and its capacity to produce reactive oxygen species. Three other cell types correspond to blast-like cells, including an immature blast cell type from which distinct hematopoietic lineages originate to give rise to H, SGC, VC, and ML cells. Additionally, ML cells and SGCs demonstrated phagocytic properties, with SGCs also involved in metal homeostasis. On the other hand, H cells, non-granular cells, and blast cells expressed antimicrobial peptides. This study thus provides a complete landscape of oyster hemocytes with functional validation linked to immune activities. This resource will be valuable for studying the impact of bacterial or viral infections in oysters.

      Strengths:

      The main strength of this study lies in its comprehensive and integrative approach, combining single-cell RNA sequencing, cytological analysis, cell fractionation, and functional assays to provide a robust characterization of hemocyte populations in Crassostrea gigas.

      (1) The innovative use of marker genes, quantifying their expression within specific cell fractions, allows for precise annotation of different cellular clusters, bridging the gap between morphological observations and transcriptional profiles.

      (2) The study provides detailed insights into the immune functions of different hemocyte types, including the identification of professional phagocytes, ROS-producing cells, and cells expressing antimicrobial peptides.

      (3) The identification and analysis of transcription factors specific to different hemocyte types and lineages offer crucial insights into cell fate determination and differentiation processes in oyster immune cells.

      (4) The authors significantly advance the understanding of oyster immune cell diversity by identifying and characterizing seven distinct hemocyte transcriptomic clusters and morphotypes.

      These strengths collectively make this study a significant contribution to the field of invertebrate immunology, providing a comprehensive framework for understanding oyster hemocyte diversity and function.

      Weaknesses:

      (1) The authors performed scRNAseq/lineage analysis and cytological analysis on oysters from two different sources. The methodology of the study raises concerns about the consistency of the sample and the variability of the results. The specific post-processing of hemocytes for scRNAseq, such as cell filtering, might also affect cell populations or gene expression profiles. It's unclear if the seven hemocyte types and their proportions were consistent across both samples. This inconsistency may affect the correlation between morphological and transcriptomic data.

      (2) The authors claim to use pathogen-free adult oysters (lines 95 and 119), but no supporting data is provided. It's unclear if the oysters were tested for bacterial and viral contaminations, particularly Vibrio and OsHV-1 μVar herpesvirus.

      (3) The KEGG and Gene Ontology analyses, while informative, are very descriptive and lack interpretation. The use of heatmaps with dendrograms for grouping cell clusters and GO terms is not discussed in the results, missing an opportunity to explore cell-type relationships. The changing order of cell clusters across panels B, C, and D in Figure 2 makes it challenging to correlate with panel A and to compare across different GO term categories. The dendrograms suggest proximity between certain clusters (e.g., 4 and 1) across different GO term types, implying similarity in cell processes, but this is not discussed. Grouping GO terms as in Figure 2A, rather than by dendrogram, might provide a clearer visualization of main pathways. Lastly, a more integrated discussion linking GO term and KEGG pathway analyses could offer a more comprehensive view of cell type characteristics. The presentation of scRNAseq results lacks depth in interpretation, particularly regarding the potential roles of different cell types based on their transcriptional profiles and marker genes. Additionally, some figures (2B, C, D, and 7C to H) suffer from information overload and small size, further hampering readability and interpretation.

      (4) The pseudotime analysis presented in the study provides modest additional information to what is already manifest from the clustering and UMAP visualization. The central and intermediate transcriptomic profile of cluster 4 relative to other clusters is apparent from the UMAP and the expression of shared marker genes across clusters (as shown in Figure 1D). The statement by the authors that 'the two types of professional phagocytes belong to the same granular cell lineage' (lines 594-596) should be formulated with more caution. While the pseudotime trajectory links macrophage-like (ML) and small granule-like (SGC) cells, this doesn't definitively establish a direct lineage relationship. Such trajectories can result from similarities in gene expression induced by factors other than lineage relationships, such as responses to environmental stimuli or cell cycle states. To conclusively establish this lineage relationship, additional experiments like cell lineage tracing would be necessary, if such tools are available for C. gigas.

      (6) Given the mention of herpesvirus as a major oyster pathogen, the lack of discussion on genes associated with antiviral immunity is a notable omission. While KEGG pathway analysis associated herpesvirus with cluster 1, the specific genes involved are not elaborated upon.

      (7) The discussion misses an opportunity for comparative analysis with related species. Specifically, a comparison of gene markers and cell populations with Crassostrea hongkongensis, could highlight similarities and differences across systems.

      Conclusion:

      The authors largely achieved their primary objective of providing a comprehensive characterization of oyster immune cells. They successfully integrated multiple approaches to identify and describe distinct hemocyte types. The correlation of these cell types with specific immune functions represents a significant advancement in understanding oyster immunity. However, certain aspects of their objectives have not been fully achieved. The lineage relationships proposed on the basis of pseudotime analysis, while interesting, require further experimental validation. The potential of antiviral defense mechanisms, an important aspect of oyster immunity, has not been discussed in depth.

      This study is likely to have a significant impact on the field of invertebrate immunology, particularly in bivalve research. It provides a new standard for comprehensive immune cell characterization in invertebrates. The identification of specific markers for different hemocyte types will facilitate future research on oyster immunity. The proposed model of hemocyte lineages, while requiring further validation, offers a framework for studying hematopoiesis in bivalves.

    4. Reviewer #3 (Public review):

      The paper addresses pivotal questions concerning the multifaceted functions of oyster hemocytes by integrating single-cell RNA sequencing (scRNA-seq) data with analyses of cell morphology, transcriptional profiles, and immune functions. In addition to investigating granulocyte cells, the study delves into the potential roles of blast and hyalinocyte cells. A key discovery highlighted in this research is the identification of cell types engaged in antimicrobial activities, encompassing processes such as phagocytosis, intracellular copper accumulation, oxidative bursts, and antimicrobial peptide synthesis.

      A particularly intriguing aspect of the study lies in the exploration of hemocyte lineages, warranting further investigation, such as employing scRNA-seq on embryos at various developmental stages.

      In the opinion of this reviewer, the discussion should compare and contrast the transcriptome characteristics of hemocytes, particularly granule cells, across the three species of bivalves, aligning with the published scRNA-seq studies in this field to elucidate the uniformities and variances in bivalve hemocytes.

    1. eLife Assessment

      This important study provides evidence for the role of neutrophil extracellular traps in chronic kidney damage (CKD) induced by chemotherapy and suggests a therapeutic approach to mitigate the kidney pathology caused by the NETs. The study utilizes a sound murine in vivo model of CKD with low-dose administration cisplatin and a genetic model for impairment of NET formation by deletion of the enzyme Pad4. In its current form, the study was seen as incomplete as there is not yet formal demonstration of NET production by neutrophils in the model of CKD used. Additionally, the accuracy and clarity of data presentation could be improved.

    2. Reviewer #1 (Public review):

      Summary:

      Chemotherapy-induced chronic kidney injury is a significant and growing concern, as it can lead to long-term renal damage and compromised kidney function. The authors have highlighted an important aspect of this issue by evaluating the potential protective effects of OPCs against cisplatin-induced kidney injury. They propose that OPCs may mitigate renal damage by reducing NET formation, which could improve kidney function.

      Strengths:

      The study addressed a significant issue in the field of chemotherapy-induced kidney injury. The use of multiple markers and experimental methods provided a comprehensive exploration of the impact of OPCs on kidney damage. This approach allowed for a nuanced understanding of how OPCs might mitigate renal injury by reducing NET formation and improving kidney function.

      Weaknesses:

      The hypothesis is intriguing and relevant. However, the study encounters challenges, such as incomplete evidence and discrepancies between the text and data. Addressing these issues is crucial to improving the overall study's conclusions. The paper can potentially advance the understanding of therapeutic strategies for chemotherapy-induced kidney injury. Nonetheless, a clearer presentation of the data is necessary for it to have a substantial impact.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to understand the mechanisms underlying chronic kidney disease (CKD) induced by cisplatin treatment. Acute or chronic kidney diseases are major adverse effects of cisplatin chemotherapy for cancer, which limits the treatment's efficacy. Understanding the disease's genesis is fundamental to identifying targets for preventing or treating these conditions.

      Strengths:

      The authors employed an in vivo model of cisplatin-induced chronic kidney disease (CKD) in mice, which displayed similar adverse effects of the therapy as seen in humans. The model called repeated low-dose cisplatin (RLCD), caused similar tissue and functional damage in the kidneys, led to harmful effects on the intestines by altering the microbiota and epithelial cell barrier, and impaired systemic vascular blood flow.

      The authors demonstrated that the detrimental effects on the intestinal barrier led to the release of bacterial compounds into the circulation, which, in association with reactive oxygen species formed by the inflammatory and oxidative action of cisplatin, activated blood, and kidney neutrophils to release neutrophil extracellular traps (NETs). In turn, they suggested circulating NETs migrated into kidney tissue, causing damage. Moreover, they showed NETs are capable of trapping coagulation factors responsible for impaired systemic blood flow.

      These conclusions were primarily based on reduced CKD symptoms and vascular damage in genetically modified animals that do not form NETs, as well as the observation that a bacterial compound (lipopolysaccharide) associated with cisplatin induces NET formation in isolated neutrophils. Moreover, treating animals with an anti-inflammatory and antioxidant natural compound simultaneously with cisplatin administration abolished the harmful effects on the kidneys and intestines.

      The authors conclude that the intestinal damage and inflammatory properties of cisplatin lead to NET release, which, in turn, is responsible for the kidney and vascular damage evoked by cisplatin treatment.

      Hence, the manuscript employs a well-designed experimental model and covers several important manifestations of cisplatin toxicity. It also uses genetically deficient mice to demonstrate the involvement of NETs in the development of chronic kidney disease (CKD)

      Weaknesses:

      Overall, the work was well executed. However, a few aspects require additional experiments to confirm the conclusions. The involvement of NETs in the genesis of CKD is unquestionable; nonetheless, the roles of locally induced versus circulating NETs, as well as the translation of in vitro NET release to in vivo CKD genesis, need further evaluation. Additionally, the primary mechanism of the natural anti-inflammatory compound used appears to be antioxidative, which does not promote the formation of reactive oxygen species necessary for NET formation. It is not clear in the title.

    1. eLife Assessment

      This important study substantially advances our understanding of noncoding somatic mutations by identifying a novel class of mutations that affect 3'UTR polyadenylation signals enriched in tumor suppressor genes in cancer. The evidence supporting the conclusions is convincing, with rigorous statistical analyses. The work will be of broad interest to cancer researchers.

    2. Reviewer #1 (Public Review):

      Kainov et al investigated the prevalence of mutations in 3'UTR that affect gene expression in cancer to identify noncoding cancer drivers.

      The authors used data from normal controls (1000 genome data) and compared it to cancer data (PCAWG). They found that in cancer 3'UTR mutations had a stronger effect on cleavage than the normal population. These mutations are negatively selected in the normal population and positively selected in cancers. The authors used PCAWG data set to identify such mutations and found that the mutations that lead to a reduction of gene expression are enriched in tumor suppressor genes and those that are increased in gene expression are enriched for oncogenes. 3'UTR mutations that reduce gene expression or occur in TSGs co-occur with non-synonymous mutations. The authors then validate the effect of 3'UTR mutations experimentally using a luciferase reporter assay. These data identify a novel class of noncoding driver genes with mutations in 3'UTR that impact polyadenylation and thus gene expression.

      This is an elegant study with fundamental insight into identifying cancer driver genes. The conclusions of this paper are mostly well supported by data, but some aspects of data analysis need to be extended.

      Comments on revisions:

      The authors addressed most of my comments.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Kainov et al investigated the prevalence of mutations in 3'UTR that affect gene expression in cancer to identify noncoding cancer drivers.

      The authors used data from normal controls (1000 genome data) and compared it to cancer data (PCAWG). They found that in cancer 3'UTR mutations had a stronger effect on cleavage than the normal population. These mutations are negatively selected in the normal population and positively selected in cancers. The authors used PCAWG data set to identify such mutations and found that the mutations that lead to a reduction of gene expression are enriched in tumor suppressor genes and those that are increased in gene expression are enriched for oncogenes. 3'UTR mutations that reduce gene expression or occur in TSGs cooccur with non-synonymous mutations. The authors then validate the effect of 3'UTR mutations experimentally using a luciferase reporter assay. These data identify a novel class of noncoding driver genes with mutations in 3'UTR that impact polyadenylation and thus gene expression.

      This is an elegant study with fundamental insight into identifying cancer driver genes. The conclusions of this paper are mostly well supported by data, but some aspects of data analysis need to be extended.

      We thank the reviewer for the positive assessment of our work and constructive comments.

      (1) It would be important for the authors to show if the findings of this study hold for metastatic cancers since most deaths occur due to metastasis and tumor heterogeneity changes when cancer progresses to metastasis. The authors should use the Hartwig data and show if metastatic cancers are enriched for 3'UTR mutations.

      This is a good suggestion, but we believe that the proposed analysis would have a significantly stronger impact in the context of a separate study focused specifically on longitudinal changes in the somatic mutation landscape as cancer progresses from primary tumours to metastases. Conducting such a study would require obtaining permissions to use relevant controlled datasets and, ideally, collaborating with oncologists to generate additional genome and transcriptome sequencing data. As such, this level of analysis would go beyond the current scope of our work.

      (2) Figure 2 should show the distribution of 3'UTR mutations by cancer type especially since authors go on to use colorectal cancer only for validations. It would be helpful to bring Figures S3A and S3C to this panel since these findings make the connections to cancer biology. Are any molecular functions enriched in addition to biological processes? Are kinases, phosphatases, etc more or less affected by 3'UTR mutations?

      As suggested, we have added a pie chart showing the distribution of 3’UTR mutations by cancer type (new Fig. 2E). Notably, nearly a half of the mutations in our dataset was of colorectal adenocarcinoma origin, justifying the focus on this type of cancer in our subsequent validation analyses. 

      To strengthen the connections to cancer biology, we moved Fig. S3A and S3C to the main text. It was more logical to integrate these panels into Fig. 3 rather than Fig. 2. We also analysed molecular function enrichment in Fig. 3E. Consistent with the biological process enrichment (now shown in Fig. 3D), this revealed an enrichment of proteins interacting with the ubiquitination pathway, including tumour suppressors SMAD2, APC and AXIN1.

      (3) Figure 3 looks at the co-occurrence of 3'UTR mutations with non-synonymous mutations but what about copy number change? You would expect the loss of the other allele to be enriched. Along the same line, are these data phased? Do you know that the nonsynonymous mutations are in the other allele or in the same allele that shows 3'UTR mutation?

      As suggested, we have analysed copy number variation data. As mentioned in the revised Results, this "showed that increased copy number was 4.1-times more common in the PCAWG data compared to allele loss. However, the incidence of copy number increase was substantially lower in the DOWN-paSNV group compared to the BG-paSNV control (Fig. S6). This points to a negative selection against duplications of genes affected by DOWNpaSNVs in cancer".

      Phasing somatic mutations in cancer samples is challenging due to high genetic heterogeneity of tumour cells. This situation will likely improve in the near future with the increased use of long-read sequencing. However, with currently available data, there is no straightforward method to determine whether mutations co-occur in the same cell. We have added a note on this in the Discussion section: "As long-read genomic sequencing data become increasingly available, it will be interesting to investigate whether these additional mutations occur in the same or in a different allele compared to the DOWN-paSNVs".

      Reviewer #2 (Public Review):

      Summary:

      To evaluate whether somatic mutations in cancer genomes are enriched with mutations in polyadenylation signal regions, the authors analyzed 1000 genomes data and PCAWG data as a control and experimental set, respectively. They observed increased enrichment of somatic mutations that may affect the function of polyA signals and confirmed that these mutations may influence the expression of the gene through a minigene expression experiment.

      Strengths:

      This study provides a systematic evaluation of polyA signal, which makes it valuable. Overall, the analytic approach and results are solid and supported by experimental validation.

      Thank you.

      Weaknesses:

      (1) This study uses APARENT2 as a tool to evaluate functional alteration in polyA signal sequences. Based on the original paper and the results shown in this paper, the algorithm appears to be of high quality. However, the whole study is dependent on the output of APARENT2. Therefore, it would be nice to

      (a) run and show a positive control run, which can show that the algorithm works well, and (b) describe the rationale for selecting this algorithm in the main text.

      As suggested, we have added control analyses to Fig. S1A-B, which show that APARENT2 performs well in our hands. We have described the rationale for using APARENT in the Results as follows: "For each paSNV, we calculated the change in cleavage/polyadenylation efficiency using the APARENT2 neural network model, which has been shown to infer this statistic more accurately than earlier approaches [Ref23]".

      (2) Are there recurrent somatic mutation calls (= exactly the same mutation across different tumor samples) in the poly(A) region of certain genes?

      We indeed see several cases where the same cleavage/polyadenylation signal is affected by the same or different DOWN mutations in different cancer samples. This finding is now summarized in the Results section and Table S1 as follows: "In several cases, including LRP1B and FOXO1, which are known to act as tumour suppressors in certain cancers, the same signal/polyadenyalation signal was disrupted by the same or different mutations in more than one sample (see columns Mut_Recurrence and Signal_Recurrence in Table S1)".

      (3) The authors nicely showed that the minigene with A>G mutation altered gene expression. Maybe one can reach a similar conclusion by analyzing a cancer dataset that has mutation and gene expression data? That is, genes with or without polyA mutations show different expression levels.

      The data presented in Fig. 5A-B show that DOWN-paSNV mutations have a negative effect on the expression of endogenous tumour suppressor genes.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Figures should be numbered in order. For example, Figure S3C is referred to in the text before S3A-B, etc.

      We have proofread the text to fix this problem.

      Adding a supplementary file with lists of genes carrying 3'UTR mutations split by effect on gene expression and cancer type would be very useful for the community.

      We now show this in Table S1, with the caveat that we could not consistently investigate the effect of DOWN-paSNV on gene expression since the transcriptomics data are not available for all cancers.

      Spelling mistake in Figure 1A - genone should be genome.

      Fixed - thank you.

      Typo in Figure 1B x-axis label +50nt should be -50nt to the left of the dashed line.

      Fixed - thank you.

      All figures use E to denote x10 but it would make the figures more readable if authors used the standard notation (x10) for all numbers with exponents and base 10.

      Done.

    1. eLife Assessment

      This paper presents valuable findings on how autophagosomes are positioned along microtubules for their efficient fusion with lysosomes, providing significant insights into the mechanism. The evidence supporting the conclusions is solid, with high-quality fluorescence microscopy combined with Drosophila genetics. This work will be of broad interest to cell biologists interested in autophagy and related cell biology fields.

    2. Reviewer #1 (Public review):

      Summary:

      It is well known that autophagosomes/autolysosomes move along microtubules. However, because previous studies did not distinguish between autophagosomes and autolysosomes, it remains unknown whether autophagosomes begin to move after fusion with lysosomes or even before fusion. In this manuscript, the authors show, using fusion-deficient cells, that both pre-fusion autophagosomes and lysosomes can move along the MT toward the minus end. By screening motor proteins and Rabs, the authors found that autophagosomal traffic is primarily regulated by the dynein-dynactin system and can be counter-regulated by kinesins. They also show that Rab7-Epg5 and Rab39-ema interactions are important for autophagosome trafficking.

      Strengths:

      This study uses reliable Drosophila genetics and high-quality fluorescence microscopy. The data are properly quantified and statistically analyzed. It is a reasonable hypothesis that gathering pre-fusion autophagosomes and lysosomes in close proximity improves fusion efficiency.

      Weaknesses:

      (1) To distinguish autophagosomes from autolysosomes, the authors used vps16 RNAi cells, which are supposed to be fusion deficient. However, the extent to which fusion is actually inhibited by knockdown of Vps16A is not shown. The co-localization rate of Atg8 and Lamp1 should be shown (as in Figure 8). Then, after identifying pre-fusion autophagosomes and lysosomes, the localization of each should be analyzed. It is also possible that autophagosomes and lysosomes are tethered by factors other than HOPS (even if they are not fused). If this is the case, autophagosomal trafficking would be affected by the movement of lysosomes.

      (2) The authors analyze autolysosomes in Figures 6 and 7. This is based on the assumption that autophagosome-lysosome fusion takes place in cells without vps16A RANi. However, even in the presence of Vps16A, both pre-fusion autophagosomes and autolysosomes should exist. This is also true in Figure 8H, where the fusion of autophagosomes and lysosomes is partially suppressed in knockdown cells of dynein, dynactin, Rab7, and Epg5. If the effect of fusion is to be examined, it is reasonable to distinguish between autophagosomes and autolysosomes and analyze only autolysosomes.

      (3) In this study, only vps16a RNAi cells were used to inhibit autophagosome-lysosome fusion. However, since HOPS has many roles besides autophagosome-lysosome fusion, it would be better to confirm the conclusion by knockdown of other factors (e.g., Stx17 RNAi).

      (4) Figure 8: Rab7 and Epg5 are also known to be directly involved in autophagosome-lysosome tethering/fusion. Even if the fusion rate is reduced in the absence of Rab7 and Epg5, it may not be the result of defective autophagosome movement, but may simply indicate that these molecules are required for fusion itself. How do the authors distinguish between the two possibilities?

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Boda et al. describes the results of a targeted RNAi screen in the background of Vps16A-depleted Drosophila larval fat body cells. In this background, lysosomal fusion is inhibited, allowing the authors to analyze the motility and localization specifically of autophagosomes, prior to their fusion with lysosomes to become autolysosomes. In this Vps16A-deleted background, mCherry-Atg8a-labeled autophagosomes accumulate in the perinuclear area, through an unknown mechanism.

      The authors found that the depletion of multiple subunits of the dynein/dynactin complex caused an alternation of this mCherry-Atg8a localization, moving from the perinuclear region to the cell periphery. Interactions with kinesin overexpression suggest these motor proteins may compete for autophagosome binding and transport. The authors extended these findings by examining potential upstream regulators including Rab proteins and selected effectors, and they also examined effects on lysosomal movement and autolysosome size. Altogether, the results are consistent with a model in which specific Rab/effector complexes direct the movement of lysosomes and autophagosomes toward the MTOC, promoting their fusion and subsequent dispersal throughout the cell.

      Strengths:

      Although previous studies of the movement of autophagic vesicles have identified roles for microtubule-based transport, this study moves the field forward by distinguishing between effects on pre- and post-fusion autophagosomes, and by its characterization of the roles of specific Dynein, Dynactin, and Rab complexes in regulating movement of distinct vesicle types. Overall, the experiments are well-controlled, appropriately analyzed, and largely support the authors' conclusions.

      Weaknesses:

      One limitation of the study is the genetic background that serves as the basis for the screening. In addition to preventing autophagosome-lysosome fusion, disruption of Vps16A has been shown to inhibit endosomal maturation and block the trafficking of components to the lysosome from both the endosome and Golgi apparatus. Additional effects previously reported by the authors include increased autophagosome production and reduced mTOR signaling. Thus Vps16A-depleted cells have a number of endosome, lysosome, and autophagosome-related defects, with unknown downstream consequences. Additionally, the cause and significance of the perinuclear localization of autophagosomes in this background is unclear. Thus, interpretations of the observed reversal of this phenotype are difficult, and have the caveat that they may apply only to this condition, rather than to normal autophagosomes. Additional experiments to observe autophagosome movement or positioning in a more normal environment would improve the manuscript.

      Specific comments

      (1) Several genes have been described that when depleted lead to perinuclear accumulation of Atg8-labeled vesicles. There seems to be a correlation of this phenotype with genes required for autophagosome-lysosome fusion; however, some genes required for lysosomal fusion such as Rab2 and Arl8 apparently did not affect autophagosome positioning as reported here. Thus, it is unclear whether the perinuclear positioning of autophagosomes is truly a general response to disruption of autophagosome-lysosome fusion, or may reflect additional aspects of Vps16A/HOPS function. A few things here would help. One would be an analysis of Atg8a vesicle localization in response to the depletion of a larger set of fusion-related genes. Another would be to repeat some of the key findings of this study (effects of specific dynein, dynactin, rabs, effectors) on Atg8a localization when Syx17 is depleted, rather than Vps16A. This should generate a more autophagosome-specific fusion defect. Third, it would greatly strengthen the findings to monitor pre-fusion autophagosome localization without disrupting fusion. Such vesicles could be identified as Atg8a-positive Lamp-negative structures. The effects of dynein and rab depletion on the tracking of these structures in a post-induction time course would serve as an important validation of the authors' findings.

      (2) The authors nicely show that depletion of Shot leads to relocalization of Atg8a to ectopic foci in Vps16A-depleted cells; they should confirm that this is a mislocalized ncMTOC by co-labeling Atg8a with an MTOC component such as MSP300. The effect of Shot depletion on Atg8a localization should also be analyzed in the absence of Vps16A depletion.

      (3) The authors report that depletion of Dynein subunits, either alone (Figure 6) or co-depleted with Vps16A (Figure 2), leads to redistribution of mCherry-Atg8a punctae to the "cell periphery". However, only cell clones that contact an edge of the fat body tissue are shown in these figures. Furthermore, in these cells, mCherry-Atg8a punctae appear to localize only to contact-free regions of these cells, and not to internal regions of clones that share a border with adjacent cells. Thus, these vesicles would seem to be redistributed to the periphery of the fat body itself, not to the periphery of individual cells. Microtubules emanating from the perinuclear ncMTOC have been described as having a radial organization, and thus it is unclear that this redistribution of mCherry-Atg8a punctae to the fat body edge would reflect a kinesin-dependent process as suggested by the authors.

      (4) To validate whether the mCherry-Atg8a structures in Vps16A-depleted cells were of autophagic origin, the authors depleted Atg8a and observed a loss of mCherry- Atg8a signal from the mosaic cells (Figure S1D, J). A more rigorous experiment would be to deplete other Atg genes (not Atg8a) and examine whether these structures persist.

      (5) The authors found that only a subset of dynein, dynactin, rab, and rab effector depletions affected mCherry- Atg8a localization, leading to their suggestion that the most important factors involved in autophagosome motility have been identified here. However, this conclusion has the caveat that depletion efficiency was not examined in this study, and thus any conclusions about negative results should be more conservative.

    4. Reviewer #3 (Public review):

      Summary:

      In multicellular organisms, autophagosomes are formed throughout the cytosol, while late endosomes/lysosomes are relatively confined in the perinuclear region. It is known that autophagosomes gain access to the lysosome-enriched region by microtubule-based trafficking. The mechanism by which autophagosomes move along microtubules remains incompletely understood. In this manuscript, Péter Lőrincz and colleagues investigated the mechanism driving the movement of nascent autophagosomes along the microtubule towards the non-centrosomal microtubule organizing center (ncMTOC) using the fly fat body as a model system. The authors took an approach whereby they examined autophagosome positioning in cells where autophagosome-lysosome fusion was inhibited by knocking down the HOPS subunit Vps16A. Despite being generated at random positions in the cytosol, autophagosomes accumulate around the nucleus when Vps16A is depleted. They then performed an RNA interference screen to identify the factors involved in autophagosome positioning. They found that the dynein-dynactin complex is required for the trafficking of autophagosomes toward ncMTOC. Dynein loss leads to the peripheral relocation of autophagosomes. They further revealed that a pair of small GTPases and their effectors, Rab7-Epg5 and Rab39-ema, are required for bidirectional autophagosome transport. Knockdown of these factors in Vps16a RNAi cells causes the scattering of autophagosomes throughout the cytosol.

      Strengths:

      The data presented in this study help us to understand the mechanism underlying the trafficking and positioning of autophagosomes.

      Weaknesses:

      Major concerns:

      (1) The localization of EPG5 should be determined. The authors showed that EPG5 colocalizes with endogenous Rab7. Rab7 labels late endosomes and lysosomes. Previous studies in mammalian cells have shown that EPG5 is targeted to late endosomes/lysosomes by interacting with Rab7. EPG5 promotes the fusion of autophagosomes with late endosomes/lysosomes by directly recognizing LC3 on autophagosomes and also by facilitating the assembly of the SNARE complex for fusion. In Figure 5I, the EPG5/Rab7-colocalized vesicles are large and they are likely to be lysosomes/autolysosomes.

      (2) The experiments were performed in Vps16A RNAi KD cells. Vps16A knockdown blocks fusion of vesicles derived from the endolysosomal compartments such as fusion between lysosomes. The pleiotropic effect of Vps16A RNAi may complicate the interpretation. The authors need to verify their findings in Stx17 KO cells, as it has a relatively specific effect on the fusion of autophagosomes with late endosomes/lysosomes.

      (3) Quantification should be performed in many places such as in Figure S4D for the number of FYVE-GFP labeled endosomes and in Figures S4H and S4I for the number and size of lysosomes.

      (4) In this study, the transport of autophagosomes is investigated in fly fat cells. In fat cells, a large number of large lipid droplets accumulate and the endomembrane systems are distinct from that in other cell types. The knowledge gained from this study may not apply to other cell types. This needs to be discussed.

      Minor concerns:

      (5) Data in some panels are of low quality. For example, the mCherry-Atg8a signal in Figure 5C is hard to see; the input bands of Dhc64c in Figure 5L are smeared.

      (6) In this study, both 3xmCherry-Atg8a and mCherry-Atg8a were used. Different reporters make it difficult to compare the results presented in different figures.

      (7) The small autophagosomes presented in Figures such as in Figure 1D and 1E are not clear. Enlarged images should be presented.

      (8) The authors showed that Epg5-9xHA coprecipitates with the endogenous dynein motor Dhc64C. Is Rab7 required for the interaction?

      (9) The perinuclear lysosome localization in Epg5 KD cells has no indication that Epg5 is an autophagosome-specific adaptor.

    5. Author Response:

      Reviewer #1 (Public review):

      Summary:

      It is well known that autophagosomes/autolysosomes move along microtubules. However, because previous studies did not distinguish between autophagosomes and autolysosomes, it remains unknown whether autophagosomes begin to move after fusion with lysosomes or even before fusion. In this manuscript, the authors show, using fusion-deficient cells, that both pre-fusion autophagosomes and lysosomes can move along the MT toward the minus end. By screening motor proteins and Rabs, the authors found that autophagosomal traffic is primarily regulated by the dynein-dynactin system and can be counter-regulated by kinesins. They also show that Rab7-Epg5 and Rab39-ema interactions are important for autophagosome trafficking.

      Strengths:

      This study uses reliable Drosophila genetics and high-quality fluorescence microscopy. The data are properly quantified and statistically analyzed. It is a reasonable hypothesis that gathering pre-fusion autophagosomes and lysosomes in close proximity improves fusion efficiency.

      Thank you for your positive comments and for acknowledging the strengths of our work.

      Weaknesses:

      (1) To distinguish autophagosomes from autolysosomes, the authors used vps16 RNAi cells, which are supposed to be fusion deficient. However, the extent to which fusion is actually inhibited by knockdown of Vps16A is not shown. The co-localization rate of Atg8 and Lamp1 should be shown (as in Figure 8). Then, after identifying pre-fusion autophagosomes and lysosomes, the localization of each should be analyzed.

      Thank you for this comment. We plan to perform immunohistochemistry experiment on Vps16A KD fat body cells for mCherry and Lamp1, as in case of other panels of Figure 8. We will also analyse the distribution of each.

      It is also possible that autophagosomes and lysosomes are tethered by factors other than HOPS (even if they are not fused). If this is the case, autophagosomal trafficking would be affected by the movement of lysosomes.

      While we cannot exclude the possibility that autophagosomes are transported indirectly by being tethered to lysosomes. However, we find this unlikely be the case as we believe in fat cells lysosomes and autophagosomes will rapidly fuse with each other if they get close enough.

      (2) The authors analyze autolysosomes in Figures 6 and 7. This is based on the assumption that autophagosome-lysosome fusion takes place in cells without vps16A RANi. However, even in the presence of Vps16A, both pre-fusion autophagosomes and autolysosomes should exist. This is also true in Figure 8H, where the fusion of autophagosomes and lysosomes is partially suppressed in knockdown cells of dynein, dynactin, Rab7, and Epg5. If the effect of fusion is to be examined, it is reasonable to distinguish between autophagosomes and autolysosomes and analyze only autolysosomes.

      Thank you for your careful insights. The mCherry-Atg8a reporter we use is highly stable in autolysosomes due to the resilience of the mCherry fluorophore within these acidic, post-fusion structures, making it useful for labelling both autophagosomes and autolysosomes. Notably, the high intensity of mCherry-Atg8a within autolysosomes allows us to distinguish them from pre-fusion autophagosomes, which appear fainter and smaller, especially when accumulated in fusion-defective backgrounds (as shown in Figure 4). We therefore regard larger, brighter structures as autolysosomes.

      To improve clarity, we included additional markers—endogenous Lamp1 staining (Figure 8) and Lamp1-GFP (Figure S9)—to help differentiate between autophagic structures. Lamp1-negative, mCherry-Atg8a-positive vesicles indicate pre-fusion autophagosomes, while Lamp1/mCherry-Atg8a double-positive vesicles represent autolysosomes. Additionally, Lamp1-positive, mCherry-Atg8a-negative vesicles mark lysosomes of non-autophagic origin. We appreciate your suggestion

      (3) In this study, only vps16a RNAi cells were used to inhibit autophagosome-lysosome fusion. However, since HOPS has many roles besides autophagosome-lysosome fusion, it would be better to confirm the conclusion by knockdown of other factors (e.g., Stx17 RNAi).

      Thank you for this suggestion. We will generate additional Drosophila lines similar to those used in our current study, substituting Syntaxin17, SNAP29 or Vamp7 RNAi for Vps16A RNAi. We will test key phenotypic hits with these new backgrounds to confirm our findings.

      (4) Figure 8: Rab7 and Epg5 are also known to be directly involved in autophagosome-lysosome tethering/fusion. Even if the fusion rate is reduced in the absence of Rab7 and Epg5, it may not be the result of defective autophagosome movement, but may simply indicate that these molecules are required for fusion itself. How do the authors distinguish between the two possibilities?

      Thank you for this comment. While we agree that Rab7 and Epg5 are involved in autophagosome-lysosome tethering and subsequent fusion, we believe they also play an additional role in autophagosome movement. Our hypothesis stems from the observation that the phenotypes of vps16 RNAi and rab7 or epg5 RNAi are not identical. In contrast, RNAi targeting SNARE proteins involved exclusively in fusion (Syx17, SNAP29, and Vamp7) all result in a consistent phenotype: autophagosomes accumulate around the nucleus, closely resembling the phenotype observed with vps16 depletion. This suggests that these SNAREs are specifically involved in fusion. Since Rab7 and Epg5 depletion scatters autophagosomes throughout the cytosol rather than transporting them to the nucleus, we hypothesize that this is due to impaired movement of autophagosomes. This hypothesis is further supported by our co-IP data showing that Epg5 binds to dyneins.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Boda et al. describes the results of a targeted RNAi screen in the background of Vps16A-depleted Drosophila larval fat body cells. In this background, lysosomal fusion is inhibited, allowing the authors to analyze the motility and localization specifically of autophagosomes, prior to their fusion with lysosomes to become autolysosomes. In this Vps16A-deleted background, mCherry-Atg8a-labeled autophagosomes accumulate in the perinuclear area, through an unknown mechanism.

      The authors found that the depletion of multiple subunits of the dynein/dynactin complex caused an alternation of this mCherry-Atg8a localization, moving from the perinuclear region to the cell periphery. Interactions with kinesin overexpression suggest these motor proteins may compete for autophagosome binding and transport. The authors extended these findings by examining potential upstream regulators including Rab proteins and selected effectors, and they also examined effects on lysosomal movement and autolysosome size. Altogether, the results are consistent with a model in which specific Rab/effector complexes direct the movement of lysosomes and autophagosomes toward the MTOC, promoting their fusion and subsequent dispersal throughout the cell.

      Strengths:

      Although previous studies of the movement of autophagic vesicles have identified roles for microtubule-based transport, this study moves the field forward by distinguishing between effects on pre- and post-fusion autophagosomes, and by its characterization of the roles of specific Dynein, Dynactin, and Rab complexes in regulating movement of distinct vesicle types. Overall, the experiments are well-controlled, appropriately analyzed, and largely support the authors' conclusions.

      Thank you for your positive comments and for acknowledging the strengths of our work.

      Weaknesses:

      One limitation of the study is the genetic background that serves as the basis for the screening. In addition to preventing autophagosome-lysosome fusion, disruption of Vps16A has been shown to inhibit endosomal maturation and block the trafficking of components to the lysosome from both the endosome and Golgi apparatus. Additional effects previously reported by the authors include increased autophagosome production and reduced mTOR signaling. Thus Vps16A-depleted cells have a number of endosome, lysosome, and autophagosome-related defects, with unknown downstream consequences. Additionally, the cause and significance of the perinuclear localization of autophagosomes in this background is unclear. Thus, interpretations of the observed reversal of this phenotype are difficult, and have the caveat that they may apply only to this condition, rather than to normal autophagosomes. Additional experiments to observe autophagosome movement or positioning in a more normal environment would improve the manuscript.

      Thank you for highlighting this limitation. We plan to conduct time-lapse imaging of live fat body tissues expressing 3xmCherry-Atg8a and GFP-Lamp1 to visualize the movement and fusion events of pre-fusion autophagosomes (3xmCherry-Atg8a positive and GFP-Lamp1 negative) and lysosomes (GFP-Lamp1 positive). We expect these vesicles to exhibit movement toward the ncMTOC, providing insight into their behaviour under more typical conditions.

      Specific comments

      (1) Several genes have been described that when depleted lead to perinuclear accumulation of Atg8-labeled vesicles. There seems to be a correlation of this phenotype with genes required for autophagosome-lysosome fusion; however, some genes required for lysosomal fusion such as Rab2 and Arl8 apparently did not affect autophagosome positioning as reported here. Thus, it is unclear whether the perinuclear positioning of autophagosomes is truly a general response to disruption of autophagosome-lysosome fusion, or may reflect additional aspects of Vps16A/HOPS function. A few things here would help. One would be an analysis of Atg8a vesicle localization in response to the depletion of a larger set of fusion-related genes. Another would be to repeat some of the key findings of this study (effects of specific dynein, dynactin, rabs, effectors) on Atg8a localization when Syx17 is depleted, rather than Vps16A. This should generate a more autophagosome-specific fusion defect.

      Thank you for this suggestion. We will generate additional Drosophila lines similar to those used in our current study, substituting Syntaxin17, SNAP29, and Vamp7 RNAi for Vps16A RNAi. We will test key phenotypic hits with these new backgrounds to confirm our findings.

      Third, it would greatly strengthen the findings to monitor pre-fusion autophagosome localization without disrupting fusion. Such vesicles could be identified as Atg8a-positive Lamp-negative structures. The effects of dynein and rab depletion on the tracking of these structures in a post-induction time course would serve as an important validation of the authors' findings.

      Thank you for this helpful suggestion. We plan to conduct time-lapse experiments under various conditions (e.g., non-starved and starved at different durations) to monitor the motility of newly formed autophagosomes (3xmCherry-Atg8a positive, Lamp1 negative), allowing us to analyze their positioning dynamics without interference from fusion defects.

      (2) The authors nicely show that depletion of Shot leads to relocalization of Atg8a to ectopic foci in Vps16A-depleted cells; they should confirm that this is a mislocalized ncMTOC by co-labeling Atg8a with an MTOC component such as MSP300. The effect of Shot depletion on Atg8a localization should also be analyzed in the absence of Vps16A depletion.

      Thank you for this positive comment, to confirm the presence of ectopic MTOC foci in Shot KD cells, we plan to co-label with MTOC markers, including Khc-nod-LacZ, and additional reporters like Msps-mCherry, in both Vps16A-depleted and normal backgrounds.

      (3) The authors report that depletion of Dynein subunits, either alone (Figure 6) or co-depleted with Vps16A (Figure 2), leads to redistribution of mCherry-Atg8a punctae to the "cell periphery". However, only cell clones that contact an edge of the fat body tissue are shown in these figures. Furthermore, in these cells, mCherry-Atg8a punctae appear to localize only to contact-free regions of these cells, and not to internal regions of clones that share a border with adjacent cells. Thus, these vesicles would seem to be redistributed to the periphery of the fat body itself, not to the periphery of individual cells. Microtubules emanating from the perinuclear ncMTOC have been described as having a radial organization, and thus it is unclear that this redistribution of mCherry-Atg8a punctae to the fat body edge would reflect a kinesin-dependent process as suggested by the authors.

      Thank you for this detailed observation. Indeed, we frequently observe autophagosomes redistributing to contact-free peripheral regions upon dynein depletion, resulting in an asymmetric distribution. We believe this redistribution to be kinesin-dependent, as shown in Figure 3: kinesin overexpression scatters or shifts autophagosomes to the periphery, while kinesin/dynein double knockdown causes widespread autophagosome scattering. The simplest explanation is that, in dynein's absence, kinesins drive autophagosome movement.

      Additionally, while the radial organization of the microtubule (MT) network has been documented in two independent studies that we referenced, neither study showed MT plus-ends specifically, towards which kinesins transport. It is plausible that, while the MT network appears radial and symmetrical, subtle asymmetry might influence kinesin-dependent transport in fat cells. To explore this further, we will express MT plus-end markers, such as EB1-RFP and EB1-GFP, as well as kinesin reporters like unc-104-GFP or HA-tagged kinesins.

      (4) To validate whether the mCherry-Atg8a structures in Vps16A-depleted cells were of autophagic origin, the authors depleted Atg8a and observed a loss of mCherry- Atg8a signal from the mosaic cells (Figure S1D, J). A more rigorous experiment would be to deplete other Atg genes (not Atg8a) and examine whether these structures persist.

      Thank you for the suggestion to further validate our reporter. We will knock down additional Atg genes, including Atg14, Atg1, Atg6, and Vps34, to confirm that the mCherry-Atg8a-positive structures in the Vps16A RNAi background are indeed of autophagic origin.

      (5) The authors found that only a subset of dynein, dynactin, rab, and rab effector depletions affected mCherry- Atg8a localization, leading to their suggestion that the most important factors involved in autophagosome motility have been identified here. However, this conclusion has the caveat that depletion efficiency was not examined in this study, and thus any conclusions about negative results should be more conservative.

      Thank you for this constructive feedback. We agree and will adjust our conclusions based on the negative results in the revised manuscript to account for the potential variability in depletion efficiency.

      Reviewer #3 (Public review):

      Summary:

      In multicellular organisms, autophagosomes are formed throughout the cytosol, while late endosomes/lysosomes are relatively confined in the perinuclear region. It is known that autophagosomes gain access to the lysosome-enriched region by microtubule-based trafficking. The mechanism by which autophagosomes move along microtubules remains incompletely understood. In this manuscript, Péter Lőrincz and colleagues investigated the mechanism driving the movement of nascent autophagosomes along the microtubule towards the non-centrosomal microtubule organizing center (ncMTOC) using the fly fat body as a model system. The authors took an approach whereby they examined autophagosome positioning in cells where autophagosome-lysosome fusion was inhibited by knocking down the HOPS subunit Vps16A. Despite being generated at random positions in the cytosol, autophagosomes accumulate around the nucleus when Vps16A is depleted. They then performed an RNA interference screen to identify the factors involved in autophagosome positioning. They found that the dynein-dynactin complex is required for the trafficking of autophagosomes toward ncMTOC. Dynein loss leads to the peripheral relocation of autophagosomes. They further revealed that a pair of small GTPases and their effectors, Rab7-Epg5 and Rab39-ema, are required for bidirectional autophagosome transport. Knockdown of these factors in Vps16a RNAi cells causes the scattering of autophagosomes throughout the cytosol.

      Strengths:

      The data presented in this study help us to understand the mechanism underlying the trafficking and positioning of autophagosomes.

      Thank you for your positive comment and for acknowledging the strengths of our work.

      Major concerns:

      (1) The localization of EPG5 should be determined. The authors showed that EPG5 colocalizes with endogenous Rab7. Rab7 labels late endosomes and lysosomes. Previous studies in mammalian cells have shown that EPG5 is targeted to late endosomes/lysosomes by interacting with Rab7. EPG5 promotes the fusion of autophagosomes with late endosomes/lysosomes by directly recognizing LC3 on autophagosomes and also by facilitating the assembly of the SNARE complex for fusion. In Figure 5I, the EPG5/Rab7-colocalized vesicles are large and they are likely to be lysosomes/autolysosomes.

      Thank you for suggesting an improvement to our Epg5 localization data. We plan to perform triple-staining experiments with autophagy and lysosome markers, such as Atg8a and Lamp1, together with Epg5-9xHA to provide a clearer context for Epg5 localization.

      (2) The experiments were performed in Vps16A RNAi KD cells. Vps16A knockdown blocks fusion of vesicles derived from the endolysosomal compartments such as fusion between lysosomes. The pleiotropic effect of Vps16A RNAi may complicate the interpretation. The authors need to verify their findings in Stx17 KO cells, as it has a relatively specific effect on the fusion of autophagosomes with late endosomes/lysosomes.

      Thank you for this valuable suggestion. We will create similar Drosophila lines as used in our study but will now employ Syntaxin17, SNAP29, or Vamp7 RNAi. We will cross our most significant hits with these new lines to confirm our findings.

      (3) Quantification should be performed in many places such as in Figure S4D for the number of FYVE-GFP labeled endosomes and in Figures S4H and S4I for the number and size of lysosomes.

      Thank you for pointing this out, we will perform the suggested quantifications and statistics.

      (4) In this study, the transport of autophagosomes is investigated in fly fat cells. In fat cells, a large number of large lipid droplets accumulate and the endomembrane systems are distinct from that in other cell types. The knowledge gained from this study may not apply to other cell types. This needs to be discussed.

      Thank you for this insight. We will discuss the potential cell-type specificity of our findings in the revised manuscript. Additionally, we plan to examine the distribution of the mCherry-Atg8a reporter in the vps16A RNAi background in other cell types, such as salivary gland cells, to broaden our analysis.

      Minor concerns:

      (5) Data in some panels are of low quality. For example, the mCherry-Atg8a signal in Figure 5C is hard to see; the input bands of Dhc64c in Figure 5L are smeared.

      Thank you for noting this. We will repeat the experiment in Figure 5C to obtain clearer images. The smeared Dhc64C input bands in Figure 5L are due to the large size of this protein, which affects its migration characteristics. We will address this in the revised manuscript.

      (6) In this study, both 3xmCherry-Atg8a and mCherry-Atg8a were used. Different reporters make it difficult to compare the results presented in different figures.

      Thank you for this comment. Both reporters are well-established as autophagic markers and function similarly. However, to reduce confusion, we have used only one type per figure to ensure comparability of results.

      (7) The small autophagosomes presented in Figures such as in Figure 1D and 1E are not clear. Enlarged images should be presented.

      Thank you for your suggestion. We will repeat these experiments and provide higher-quality, enlarged images for clarity.

      (8) The authors showed that Epg5-9xHA coprecipitates with the endogenous dynein motor Dhc64C. Is Rab7 required for the interaction?

      Thank you for this question. We will investigate this by co-transfecting the cells with WT and GTP- or GDP-locked Rab7 mutants (which mimic constitutively active and dominant-negative forms, respectively) with Epg5-9xHA. This will allow us to assess whether Rab7 modulates the Epg5-Dhc interaction.

      (9) The perinuclear lysosome localization in Epg5 KD cells has no indication that Epg5 is an autophagosome-specific adaptor.

      Thank you for this comment. We will moderate our statement regarding Epg5's role as an autophagosome-specific adaptor in the revised manuscript.

    1. eLife Assessment

      The work by Han and collaborators describes valuable findings on the role of Akkermansia muciniphila during ETEC infection. If confirmed, these findings will add to a growing list of beneficial properties of this organism. The strength of the evidence used to justify the conclusions in the manuscript is solid, as the analyses broadly support the claims with only minor weaknesses.

    2. Reviewer #3 (Public review):

      Summary:

      The manuscript by Ma et al. describes a multi-model (pig, mouse, organoid) investigation into how fecal transplants protect against E. coli infection. The authors identify A. muciniphila and B. fragilis as two important strains and characterize how these organisms impact the epithelium by modulating host signaling pathways, namely the Wnt pathway in lgr5 intestinal stem cells.

      Strengths:

      The strengths of this manuscript include the use of multiple model systems and follow up mechanistic investigations to understand how A. muciniphila and B. fragilis interacted with the host to impact epithelial physiology.

      Weaknesses:

      As in previous revisions, there remains concerning ambiguity in the methodology used for microbiota sequence analysis and it would be difficult to replicate the analysis in any meaningful way. In this revision, concerns about the rigor and reproducibility of this component of the manuscript have been increased. Readers should be cautious with interpretation of this data.

      (1) In previous versions of the manuscript it would appear the correct bioproject accession was listed but, the actual link went to an unrelated project. The updated accession link appears to contain raw data; however, the authors state they used an Illumina HiSeq 2500. This would be an unusual choice for V3-V4 as it would not have read lengths long enough to overlap. Inspection of the first sample (SRR19164796) demonstrates that this is absolutely not the raw data, as there is a ~400 nt forward read, and a 0 length reverse read. All quality scores are set to 30. There is no logical way to go from HiSeq 2500 raw data and read lengths to what was uploaded to the SRA and it was certainly not described in the manuscript.

      (2) No multiple testing correction was applied to the microbiome data.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      Ma X. et al proposed that A. muciniphila was a key strain that promotes the proliferation and differentiation of intestinal stem cells through acting on the Wnt/β-catenin signaling pathway. They used various models, such as piglet model, mouse model and intestinal organoids to address how A. muciniphila and B. fragilis offer the protection against ETEC infection. They showed that FMT with fecal samples, A. muciniphila or B. fragilis protected piglets and/or mice from ETEC infection, and this protection is manifested as reduced intestinal inflammation/bacterial colonization, increased tight junction/Muc2 proteins, as well as proper Treg/Th17 cells. Additionally, they demonstrated that A. muciniphila protected basal-out and/or apical-out intestinal organoids against ETEC infection via Wnt signaling.

      Comments on revised version:

      Please add proper references to indicate the invasion of ETEC into organoids after 1 h of infection.

      We have added references on line 211.

      References:

      Xiao K, Yang Y, Zhang Y, Lv QQ, Huang FF, Wang D, Zhao JC, Liu YL. 2022. Long-chain PUFA ameliorate enterotoxigenic Escherichia coli-induced intestinal inflammation and cell injury by modulating pyroptosis and necroptosis signaling pathways in porcine intestinal epithelial cells. Br. J. Nutr. 128(5):835-850.

      Qian MQ, Zhou XC, Xu TT, Li M, Yang ZR, Han XY. 2023. Evaluation of Potential Probiotic Properties of Limosilactobacillus fermentum Derived from Piglet Feces and Influence on the Healthy and E. coli-Challenged Porcine Intestine. Microorganisms. 11(4).

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Ma et al. describes a multi-model (pig, mouse, organoid) investigation into how fecal transplants protect against E. coli infection. The authors identify A. muciniphila and B. fragilis as two important strains and characterize how these organisms impact the epithelium by modulating host signaling pathways, namely the Wnt pathway in lgr5 intestinal stem cells.

      Strengths:

      The strengths of this manuscript include the use of multiple model systems and follow up mechanistic investigations to understand how A. muciniphila and B. fragilis interacted with the host to impact epithelial physiology.

      Weaknesses:

      After an additional revision, the bioinformatics section of the methods has changed significantly from previous versions and now indicates a third sequencer was used instead: Ion S5 XL. Important parameters required to replicate analysis have still not been provided. Inspection of the SRA data indicates a mix of Illumina MiSeq and Illumina HiSeq 2500. It is now unclear which sequencing technology was used as authors have variably reported 4 different sequencers for these samples. Appropriate metadata was not provided in the SRA, although some groups may be inferred from sample names. These changing descriptions of the methodologies and ambiguity in making the data available create concerns about rigor of study and results.

      Due to confusing the sequencing method of this experiment with other experiment samples, we apologize for the multiple incorrect modifications of the method description. We have modified the method for microbiome sequencing technology on line 304. The sequencing technology is Illumina HiSeq 2500. The SRA metadata can be viewed at https://www.ncbi.nlm.nih.gov/sra/PRJNA837047. The sample names ep1-6 and ef1-6 were correspond to the EP and EF groups, respectively.

      Recommendations For the Authors:

      As in the previous revision:

      -provide important parameters required to replicate analysis

      -ensure that reporting of sequencing technology is correct as data listed on SRA appears to be derived from Illumina sequencers, and was deposited indicating as such.

      -update SRA metadata such that experimental groups are clear and match the nomenclature used in the manuscript (Particularly for samples which are labelled [A-Z][0-9]

      - The multiple testing correction wasn’t applied.

      -Due to confusing the sequencing method of this experiment with other experiment samples, we apologize for the multiple incorrect modifications of the method description. We have modified the method for microbiome sequencing technology on line 304. The sequencing technology is Illumina HiSeq 2500.

      - The SRA metadata can be viewed at https://www.ncbi.nlm.nih.gov/sra/PRJNA837047. The sample names ep1-6 and ef1-6 were correspond to the EP and EF groups, respectively.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Bennion and colleagues present a careful examination of how an earlier set of memories can either interfere with or facilitate memories formed later. This impressive work is a companion piece to an earlier paper by Antony and colleagues (2022) in which a similar experimental design was used to examine how a later set of memories can either interfere with or facilitate memories formed earlier. This study makes contact with an experimental literature spanning 100 years, which is concerned with the nature of forgetting, and the ways in which memories for particular experiences can interact with other memories. These ideas are fundamental to modern theories of human memory, for example, paired-associate studies like this one are central to the theoretical idea that interference between memories is a much bigger contributor to forgetting than any sort of passive decay. 

      Strengths: 

      At the heart of the current investigation is a proposal made by Osgood in the 1940s regarding how paired associates are learned and remembered. In these experiments, one learns a pair of items, A-B (cue-target), and then later learns another pair that is related in some way, either A'-B (changing the cue, delta-cue), or A-B' (changing the target, delta-target), or A'-B' (changing both, delta-both), where the prime indicates that item has been modified, and may be semantically related to the original item. The authors refer to the critical to-be-remembered pairs as base pairs. Osgood proposed that when the changed item is very different from the original item there will be interference, and when the changed item is similar to the original item there will be facilitation. Osgood proposed a graphical depiction of his theory in which performance was summarized as a surface, with one axis indicating changes to the cue item of a pair and the other indicating changes to the target item, and the surface itself necessary to visualize the consequences of changing both. 

      In the decades since Osgood's proposal, there have been many studies examining slivers of the proposal, e.g., just changing targets in one experiment, just changing cues in another experiment. Because any pair of experiments uses different methods, this has made it difficult to draw clear conclusions about the effects of particular manipulations. 

      The current paper is a potential landmark, in that the authors manipulate multiple fundamental experimental characteristics using the same general experimental design. Importantly, they manipulate the semantic relatedness of the changed item to the original item, the delay between the study experience and the test, and which aspect of the pair is changed. Furthermore, they include both a positive control condition (where the exact same pair is studied twice), and a negative control condition (where a pair is only studied once, in the same phase as the critical base pairs). This allows them to determine when the prior learning exhibits an interfering effect relative to the negative control condition and also allows them to determine how close any facilitative effects come to matching the positive control. 

      The results are interpreted in terms of a set of existing theories, most prominently the memory-for-change framework, which proposes a mechanism (recursive reminding) potentially responsible for the facilitative effects examined here. One of the central results is the finding that a stronger semantic relationship between a base pair and an earlier pair has a facilitative effect on both the rate of learning of the base pair and the durability of the memory for the base pair. This is consistent with the memory-for-change framework, which proposes that this semantic relationship prompts retrieval of the earlier pair, and the two pairs are integrated into a common memory structure that contains information about which pair was studied in which phase of the experiment. When semantic relatedness is lower, they more often show interference effects, with the idea being that competition between the stored memories makes it more difficult to remember the base pair. 

      This work represents a major methodological and empirical advance for our understanding of paired-associates learning, and it sets a laudably high bar for future work seeking to extend this knowledge further. By manipulating so many factors within one set of experiments, it fills a gap in the prior literature regarding the cognitive validity of an 80-year-old proposal by Osgood. The reader can see where the observed results match Osgood's theory and where they are inconclusive. This gives us insight, for example, into the necessity of including a long delay in one's experiment, to observe potential facilitative effects. This point is theoretically interesting, but it is also a boon for future methodological development, in that it establishes the experimental conditions necessary for examining one or another of these facilitation or interference effects more closely. 

      We thank the reviewer for their thorough and positive comments -- thank you so much!

      Weaknesses: 

      One minor weakness of the work is that the overarching theoretical framing does not necessarily specify the expected result for each and every one of the many effects examined. For example, with a narrower set of semantic associations being considered (all of which are relatively high associations) and a long delay, varying the semantic relatedness of the target item did not reliably affect the memorability of that pair. However, the same analysis showed a significant effect when the wider set of semantic associations was used. The positive result is consistent with the memory-for-change framework, but the null result isn't clearly informative to the theory. I call this a minor weakness because I think the value of this work will grow with time, as memory researchers and theorists use it as a benchmark for new theory development. For example, the data from these experiments will undoubtedly be used to develop and constrain a new generation of computational models of paired-associates learning. 

      We thank the reviewer for this constructive critique. We agree that the experiments with a narrower set of semantic associations are less informative; in fact, we thought about removing these experiments from the current study, but given that we found results in the ΔBoth condition in Antony et al. (2022) using these stimuli that we did NOT find in the wider set, we thought it was worth including for a thorough comparison. We hope that the analyses combining the two experiment sets (Fig 6-Supp 1) are informative for contextualizing the results in the ‘narrower’ experiments and, as the reviewer notes, for informing future researchers.

      Reviewer #2 (Public Review): 

      Summary: 

      The study focuses on how relatedness with existing memories affects the formation and retention of new memories. Of core interest were the conditions that determine when prior memories facilitate new learning or interfere with it. Across a set of experiments that varied the degree of relatedness across memories as well as retention interval, the study compellingly shows that relatedness typically leads to proactive facilitation of new learning, with interference only observed under specific conditions and immediate test and being thus an exception rather than a rule. 

      Strengths: 

      The study uses a well-established word-pair learning paradigm to study interference and facilitation of overlapping memories. However it goes more in-depth than a typical interference study in the systematic variation of several factors: (1) which elements of an association are overlapping and which are altered (change target, change cue, change both, change neither); (2) how much the changed element differs from the original (word relatedness, with two ranges of relatedness considered); (3) retention period (immediate test, 2-day delay). Furthermore, each experiment has a large N sample size, so both significant effects as well as null effects are robust and informative. 

      The results show the benefits of relatedness, but also replicate interference effects in the "change target" condition when the new target is not related to the old target and when the test is immediate. This provides a reconciliation of some existing seemingly contradictory results on the effect of overlap on memory. Here, the whole range of conditions is mapped to convincingly show how the direction of the effect can flip across the surface of relatedness values. 

      Additional strength comes from supporting analyses, such as analyses of learning data, demonstrating that relatedness leads to both better final memory and also faster initial learning. 

      More broadly, the study informs our understanding of memory integration, demonstrating how the interdependence of memory for related information increases with relatedness. Together with a prior study or retroactive interference and facilitation, the results provide new insights into the role of reminding in memory formation. 

      In summary, this is a highly rigorous body of work that sets a great model for future studies and improves our understanding of memory organization. 

      We thank their reviewer for their thorough summary and very supportive words!

      Weaknesses: 

      The evidence for the proactive facilitation driven by relatedness is very convincing. However, in the finer scale results, the continuous relationship between the degree of relatedness and the degree of proactive facilitation/interference is less clear. This could be improved with some additional analyses and/or context and discussion. In the narrower range, the measure used was AS, with values ranging from 0.03-0.98, where even 0.03 still denotes clearly related words (pious - holy). Within this range from "related" to "related a lot", no relationship to the degree of facilitation was found. The wider range results are reported using a different scale, GloVe, with values from -0.14 to 0.95, where the lower end includes unrelated words (sap - laugh). It is possible that any results of facilitation/interference observed in the wider range may be better understood as a somewhat binary effect of relatedness (yes or no) rather than the degree of relatedness, given the results from the narrower condition. These two options could be more explicitly discussed. The report would benefit from providing clearer information about these measures and their range and how they relate to each other (e.g., not a linear transformation). It would be also helpful to know how the values reported on the AS scale would end up if expressed in the GloVe scale (and potentially vice-versa) and how that affects the results. Currently, it is difficult to assess whether the relationship between relatedness and memory is qualitative or quantitative. This is less of a problem with interdependence analyses where the results converge across a narrow and wider range. 

      We thank the reviewer for this point. While other analyses do show differences across the range of AS values we used, we agree in the case of the memorability analysis in the narrower stimulus set, 48-hr experiment (or combining across the narrower and wider stimulus sets), there could be a stronger influence of binary (yes/no) relatedness. We have now made this point explicitly (p. 26):

      “Altogether, these results show that PI can still occur with low relatedness, like in other studies finding PI in ΔTarget (A-B, A-D) paradigms (for a review, see Anderson & Neely, 1996), but PF occurs with higher relatedness. In fact, the absence of low relatedness pairs in the narrower stimulus set likely led to the strong overall PF in this condition across all pairs (positive y-intercept in the upper right of Fig 3A). In this particular instance, there may have been a stronger influence of a binary factor (whether they are related or not), though this remains speculative and is not the case for other analyses in our paper.”

      Additionally, we have also emphasized that the two relatedness metrics are not linear transforms of each other. Finally, as in addressing both your and reviewer #3’s comment below, we now graph relatedness values under a common GloVe metric in Fig 1-Supp 1C (p. 9):

      “Please note that GloVe is an entirely different relatedness metric and is not a linear transformation of AS (see Fig 1-Supp 1C for how the two stimulus sets compare using the common GloVe metric).”

      A smaller weakness is generalizability beyond the word set used here. Using a carefully crafted stimulus set and repeating the same word pairings across participants and conditions was important for memorability calculations and some of the other analyses. However, highlighting the inherently noisy item-by-item results, especially in the Osgood-style surface figures, makes it challenging to imagine how the results would generalize to new stimuli, even within the same relatedness ranges as the current stimulus sets. 

      We thank the reviewer for this critique. We have added this caveat in the limitations to suggest that future studies should replicate these general findings with different stimulus sets (p. 28):

      “Finally, future studies could ensure these effects are not limited to these stimuli and generalize to other word stimuli in addition to testing other domains (Baek & Papaj, 2024; Holding, 1976).”

      Reviewer #3 (Public Review): 

      Summary: 

      Bennion et al. investigate how semantic relatedness proactively benefits the learning of new word pairs. The authors draw predictions from Osgood (1949), which posits that the degree of proactive interference (PI) and proactive facilitation (PF) of previously learned items on to-be-learned items depends on the semantic relationships between the old and new information. In the current study, participants learn a set of word pairs ("supplemental pairs"), followed by a second set of pairs ("base pairs"), in which the cue, target, or both words are changed, or the pair is identical. Pairs were drawn from either a narrower or wider stimulus set and were tested after either a 5-minute or 48-hour delay. The results show that semantic relatedness overwhelmingly produces PF and greater memory interdependence between base and supplemental pairs, except in the case of unrelated pairs in a wider stimulus set after a short delay, which produced PI. In their final analyses, the authors compare their current results to previous work from their group studying the analogous retroactive effects of semantic relatedness on memory. These comparisons show generally similar, if slightly weaker, patterns of results. The authors interpret their results in the framework of recursive reminders (Hintzman, 2011), which posits that the semantic relationships between new and old word pairs promote reminders of the old information during the learning of the new to-be-learned information. These reminders help to integrate the old and new information and result in additional retrieval practice opportunities that in turn improve later recall. 

      Strengths: 

      Overall, I thought that the analyses were thorough and well-thought-out and the results were incredibly well-situated in the literature. In particular, I found that the large sample size, inclusion of a wide range of semantic relatedness across the two stimulus sets, variable delays, and the ability to directly compare the current results to their prior results on the retroactive effects of semantic relatedness were particular strengths of the authors' approach and make this an impressive contribution to the existing literature. I thought that their interpretations and conclusions were mostly reasonable and included appropriate caveats (where applicable). 

      We thank the reviewer for this kind, effective summary and highlight of the paper’s strengths!

      Weaknesses: 

      Although I found that the paper was very strong overall, I have three main questions and concerns about the analyses. 

      My first concern lies in the use of the narrow versus wider stimulus sets. I understand why the initial narrow stimulus set was defined using associative similarity (especially in the context of their previous paper on the retroactive effects of semantic similarity), and I also understand their rationale for including an additional wider stimulus set. What I am less clear on, however, is the theoretical justification for separating the datasets. The authors include a section combining them and show in a control analysis that there were no directional effects in the narrow stimulus set. The authors seem to imply in the Discussion that they believe there are global effects of the lower average relatedness on differing patterns of PI vs PF across stimulus sets (lines 549-553), but I wonder if an alternative explanation for some of their conflicting results could be that PI only occurs with pairs of low semantic relatedness between the supplemental and base pair and that because the narrower stimulus set does not include the truly semantically unrelated pairs, there was no evidence of PI. 

      We agree with the reviewer’s interpretation here, and we have now directly stated this in the discussion section (p. 26):

      “Altogether, these results show that PI can still occur with low relatedness, like in other studies finding PI in ΔTarget (A-B, A-D) paradigms (for a review see, Anderson & Neely, 1996), but PF occurs with higher relatedness. In fact, the absence of low relatedness pairs in the narrower stimulus set likely led to the strong overall PF in this condition across all pairs (positive y-intercept in the upper right of Fig 3A).”

      As for the remainder of this concern, please see our response to your elaboration on the critique below.

      My next concern comes from the additive change in both measures (change in Cue + change in Target). This measure is simply a measure of overall change, in which a pair where the cue changes a great deal but the target doesn't change is treated equivalently to a pair where the target changes a lot, but the cue does not change at all, which in turn are treated equivalently to a pair where the cue and target both change moderate amounts. Given that the authors speculate that there are different processes occurring with the changes in cue and target and the lack of relationship between cue+target relatedness and memorability, it might be important to tease apart the relative impact of the changes to the different aspects of the pair. 

      We thank the reviewer for this great point. First, we should clarify that we only added cue and target similarity values in the ΔBoth condition, which means that all instances of equivalence relate to non-zero values for both cue and target similarity. However, it is certainly possible cue and target similarity separately influence memorability or interdependence. We have now run this analysis separately for cue and target similarity (but within the ΔBoth condition). For memorability, neither cue nor target similarity independently predicted memorability within the ΔBoth condition in any of the four main experiments (all p > 0.23). Conversely, there were some relationships with interdependence. In the narrower stimulus set, 48-hr delay experiment, both cue and target similarity significantly or marginally predicted base-secondary pair interdependence (Cue: r = 0.30, p = 0.04; Target: r = 0.29, p = 0.054). Notably, both survived partial correlation analyses partialing out the other factor (Cue: r = 0.33, p = 0.03; Target: r = 0.32, p = 0.04). In the wider stimulus set, 48-hr delay experiment, only target similarity predicted interdependence (Cue: r = 0.09, p = 0.55; Target: r = 0.34, p = 0.02), and target similarity also predicted interdependence after partialing out cue similarity (r = 0.34, p = 0.02). Similarly, in the narrower stimulus set, 5-min delay experiment, only target similarity predicted interdependence (Cue: r = 0.01, p = 0.93; Target: r = 0.41, p = 0.005), and target similarity also predicted interdependence after partialing out cue similarity (r = 0.42, p = 0.005). Neither predicted interdependence in the wider stimulus set, 5-min delay experiment (Cue: r = -0.14, p = 0.36; Target: r = 0.09, p = 0.54). We have opted to leave this out of the paper for now, but we could include it if the reviewer believes it is worthwhile.

      Note that we address the multiple regression point raised by the reviewer in the critique below.

      Finally, it is unclear to me whether there was any online spell-checking that occurred during the free recall in the learning phase. If there wasn't, I could imagine a case where words might have accidentally received additional retrieval opportunities during learning - take for example, a case where a participant misspelled "razor" as "razer." In this example, they likely still successfully learned the word pair but if there was no spell-checking that occurred during the learning phase, this would not be considered correct, and the participant would have had an additional learning opportunity for that pair. 

      We did not use online spell checking. We agree that misspellings would be considered successful instances of learning (meaning that for those words, they would essentially have successful retrieval more than once). However, we do not have a reason to think that this would meaningfully differ across conditions, so the main learning results would still hold. We have included this in the Methods (p. 29-30):

      “We did not use spell checking during learning, meaning that in some cases pairs could have been essentially retrieved more than once. However, we do not believe this would differ across conditions to affect learning results.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      In terms of the framing of the paper, I think the paper would benefit from a clearer explication of the different theories at play in the introductory section. There are a few theories being examined. Memory-for-change is described in most detail in the discussion, it would help to describe it more deliberately in the intro. The authors refer to a PI account, and this is contrasted with the memory-for-change account, but it seems to me that these theories are not mutually exclusive. In the discussion, several theories are mentioned in passing without being named, e.g., I believe the authors are referring to the fan effect when they mention the difference between delta-cue and delta-target conditions. Perhaps this could be addressed with a more detailed account of the theory underlying Osgood's predictions, which I believe arise from an associative account of paired-associates memory. Osgood's work took place when there was a big debate between unlearning and interference. The current work isn't designed to speak directly to that old debate. But it may be possible to develop the theory a bit more in the intro, which would go a long way towards scaffolding the many results for the reader, by giving them a better sense up front of the theoretical implications. 

      We thank the reviewer for this comment and the nudge to clarify these points. First, we have now made the memory-for-change and remindings accounts more explicit in the introduction, as well as the fact that we are combining the two in forming predictions for the current study (p. 3):

      “Conversely, in favor of the PF account, we consider two main, related theories. The first is the importance of “remindings” in memory, which involve reinstating representations from an earlier study phase during later learning (Hintzman, 2011). This idea centers study-phase retrieval, which involves being able to mentally recall prior information and is usually applied to exact repetitions of the same material (Benjamin & Tullis, 2010; Hintzman et al., 1975; Siegel & Kahana, 2014; Thios & D’Agostino, 1976; Zou et al., 2023). However, remindings can occur upon the presentation of related (but not identical) material and can result in better memory for both prior and new information when memory for the linked events becomes more interdependent (Hintzman, 2011; Hintzman et al., 1975; McKinley et al., 2019; McKinley & Benjamin, 2020; Schlichting & Preston, 2017; Tullis et al., 2014; Wahlheim & Zacks, 2019). The second is the memory-for-change framework, which builds upon these ideas and argues that humans often retrieve prior experiences during new learning, either spontaneously by noticing changes from what was learned previously or by instruction (Jacoby et al., 2015; Jacoby & Wahlheim, 2013). The key advance of this framework is that recollecting changes is necessary for PF, whereas PI occurs without recollection. This framework has been applied to paradigms including stimulus changes, including common paired associate paradigms (e.g., A-B, A-D) that we cover extensively later. Because humans may be more likely to notice and recall prior information when it is more related to new information, these two accounts would predict that semantic relatedness instead promotes successful remindings, which would create PF and interdependence among the traces.”

      Second, as the reviewer suggests, we were referring to the fan effect in the discussion, and we have now made that more explicit (p. 26):

      “We believe these effects arise from the competing processes of impairments between competing responses at retrieval that have not been integrated versus retrieval benefits when that integration has occurred (which occurs especially often with high target relatedness). These types of competing processes appear operative in various associative learning paradigms such as retrieval-induced forgetting (Anderson & McCulloch, 1999; Carroll et al., 2007), and the fan effect (Moeser, 1979; Reder & Anderson, 1980).”

      Finally, our reading of Osgood’s proposal is as an attempt to summarize the qualitative effects of the scattered literature (as of 1949) and did not discuss many theories. For this reason, we generally focus on the directional predictions relating to Osgood’s surface, but we couch it in theories proposed since then.

      It strikes me that the advantage seen for items in the retroactive study compared to the proactive study is consistent with classic findings examining spontaneous recovery. These classic studies found that first-learned materials tended to recover to a level above second-learned materials as time passed. This could be consistent with the memory-for-change proposal presented in the text. The memory-for-change proposal provides a potential cognitive mechanism for the effect, here I'm just suggesting a connection that could be made with the spontaneous recovery literature. 

      We thank the reviewer for this suggestion. Indeed, we agree there is a meaningful point of connection here. We have added the following to the Discussion (p. 27):

      “Additionally, these effects partially resemble those on spontaneous recovery, whereby original associations tend to face interference after new, conflicting learning, but slowly recover over time (either absolutely or relative to the new learning) and often eventually eclipse memory for the new information (Barnes & Underwood, 1959; Postman et al., 1969; Wheeler, 1995). In both cases, original associations appear more robust to change over time, though it is unclear whether these similar outcomes stem from similar mechanisms.”

      Minor recommendations 

      Line 89: relative existing -> relative to existing. 

      Line 132: "line from an unrelated and identical target" -> from an unrelated to identical target (take a look, just needs rephrasing). 

      Line 340: (e.g. peace-shaverazor) I wasn't clear whether this was a typographical error, or whether the intent was to typographically indicate a unified representation. <br /> Line 383: effects on relatedness -> effects of relatedness. 

      We think the reviewer for catching these errors. We have fixed them, and for the third comment, we have clarified that we indeed meant to indicate a unified representation (p. 12):

      “[e.g., peace-shaverazor (written jointly to emphasize the unification)]”

      Page 24: Figure 8. I think the statistical tests in this figure are just being done between the pairs of the same color? Like in the top left panel, delta-cue pro and delta-target retro are adjacent and look equivalent, but there is no n.s. marking for this pair. Could consider keeping the connecting line between the linked conditions and removing the connecting lines that span different conditions. 

      Indeed, we were only comparing conditions with the same color. We have changed the connecting lines to reflect this.

      Page 26 line 612: I think this is the first mention that the remindings account is referred to as the memory-for-change framework, consider mentioning this in the introduction. 

      Thank you – we have now mentioned this in the introduction.

      Lines 627-630. Is this sentence referring to the fan effect? If so it could help the reader to name it explicitly. 

      We have now named this explicitly.

      Reviewer #2 (Recommendations For The Authors): 

      This is a matter of personal preference, but I would prefer PI and PF spelled out instead of the abbreviations. This was also true for RI and RF which are defined early but then not used for 20 pages before being re-used again. In contrast, the naming of the within-subject conditions was very intuitive. 

      We appreciate this perspective. However, we prefer to keep the terms PI and PF for the sake of brevity. We now re-introduce terms that do not return until later in the manuscript.

      Osgood surface in Figure 1A could be easier to read if slightly reformatted. For example, target and cue relatedness sides are very disproportional and I kept wondering if that was intentional. The z-axis could be slightly more exaggerated so it's easier to see the critical messages in that figure (e.g., flip from + to - effect along the one dimension). The example word pairs were extremely helpful. 

      Figures 1C and 1D were also very helpful. It would be great if they could be a little bigger as the current version is hard to read. 

      Figure 1B took a while to decipher and could use a little more anticipation in the body of the text. Any reason to plot the x-axis from high to low on this figure? It is confusing (and not done in the actual results figures). I believe the supplemental GloVe equivalent in the supplement also has a confusing x-axis. 

      Thank the reviewer for this feedback. We have modified Figure 1A to reduce the disproportionality and accentuate the z-axis changes. We have also made the text in C and D larger. Finally, we have flipped around the x-axis in B and in the supplement.

      The description of relatedness values was rather confusing. It is not intuitive to accept that AS values from 0.03-0.96 are "narrow", as that seems to cover almost the whole theoretical range. I do understand that 0.03 is still a value showing relatedness, but more explanation would be helpful. It is also not clear how the GloVe values compare to the AS values. If I am understanding the measures and ranges correctly, the "narrow" condition could also be called "related only" while the "wide" condition could be called "related and unrelated". This is somewhat verbalized but could be clearer. In general, please provide a straightforward way for a reader to explicitly or implicitly compare those conditions, or even plot the "narrow" condition using both AS values and GloVe values so one can really compare narrow and wider conditions comparing apples with apples. 

      We thank the reviewer for this critique. First, we have now sought to clarify this in the Introduction (p. 11-12):

      “Across the first four experiments, we manipulated two factors: range of relatedness among the pairs and retention interval before the final test. The narrower range of relatedness used direct AS between pairs using free association norms, such that all pairs had between 0.03-0.96 association strength. Though this encompasses what appears to be a full range of relatedness values, pairs with even low AS are still related in the context of all possible associations (e.g., pious-holy has AS = 0.03 but would generally be considered related) (Fig 1B). The stimuli using a wider range of relatedness spanned the full range of global vector similarity (Pennington et al., 2014) that included many associations that would truly be considered unrelated (Fig 1-Supp 1A). One can see the range of the wider relatedness values in Fig 1-Supp 1B and comparisons between narrower and wider relatedness values in Fig 1-Supp 1C.”

      Additionally, as noted in the text above, we have added a new subfigure to Fig 1-Supp 1 that compares the relatedness values in the narrower and wider stimulus sets using the common GloVe metric.

      Considering a relationship other than linear may also be beneficial (e.g., the difference between AS of 0.03 and 0.13 may not be equal to AS of .83 and .93; same with GloVe). I am assuming that AS and GloVe are not linear transforms of each other. Thus, it is not clear whether one should expect a linear (rather than curvilinear or another monotonic) relationship with both of them. It could be as simple as considering rank-order correlation rather than linear correlation, but just wanted to put this out for consideration. The linear approach is still clearly fruitful (e.g., interdependence), but limits further the utility of having both narrow and wide conditions without a straightforward way to compare them. 

      We thank the reviewer for this point. Indeed, AS and GloVe are not linear transforms of each other, but metrics derived from different sources (AS comes from human free associations; GloVe comes from a learned vector space language model). (We noted this in the text and in our response to your above comment.) However, we do have the ability to put all the word pairs into the GloVe metric, which we do in the Results section, “Re-assessing proactive memory and interdependence effects using a common metric”. In this analysis, we used a linear correlation that combined data sets with a similar retention interval and replicated our main findings earlier in the paper (p. 5):

      “In the 48-hr delay experiment, correlations between memorability and cue relatedness in the ΔCue condition [r2(44) > 0.29, p < 0.001] and target relatedness in the ΔTarget condition [r2(44) = 0.2, p < 0.001] were significant, whereas cue+target relatedness in the ΔBoth condition was not [r2(44) = 0.01, p = 0.58]. In all three conditions, interdependence increased with relatedness [all r2(44) > 0.16, p < 0.001].”

      Following the reviewer suggestion to test things out using rank order, we also re-created the combined analysis using rank order based on GloVe values rather than the raw GloVe values. The ranks now span 1-90 (because there were 45 pairs in each of the narrower and wider stimulus sets). All results qualitatively held.

      Author response image 1.

      Rank order results.

      Author response image 2.

      And the raw results in Fig 6-Supp 1 (as a reference).

      Reviewer #3 (Recommendations For The Authors):

      In regards to my first concern, the authors could potentially test whether the stimulus sets are different by specifically looking at pairs from the wider stimulus set that overlap with the range of relatedness from the narrow set and see if they replicate the results from the narrow stimulus set. If the results do not differ, the authors could simplify their results section by collapsing across stimulus sets (as they did in the analyses presented in Figure 6 - Supplementary Figure 1). If the authors opt to keep the stimulus sets separate, it would be helpful to include a version of Figure 1b/Figure 1 - Supplementary Figure 1 where the coverage of the two stimulus sets are plotted on the same figure using GloVe similarity so it is easier to interpret the results. 

      We have conducted this analysis in two ways, though we note that we will eventually settle upon keeping the stimulus sets separate. First, we examined memorability between the data sets by removing one pair at a time from the wider stimulus set until there was no significant difference (p > 0.05). We did this at the long delay because that was more informative for most of our analyses. Even after reducing the wider stimulus set, the narrow stimulus set still had significantly or marginally higher memorability in all three conditions (p < 0.001 for ΔCue; p < 0.001 for ΔTarget; p = 0.08 for ΔBoth. We reasoned that this was likely because the AS values still differed (all, p < 0.001), which would present a clear way for participants to associate words that may not be as strongly similar in vector space (perhaps due to polysemy for individual words). When we ran the analysis a different way that equated AS, we no longer found significant memorability differences (p \= 0.13 for ΔCue; p = 0.50 for ΔTarget; p = 0.18 for ΔBoth). However, equating the two data sets in this analysis required us to drop so many pairs to equate the wider stimulus data set (because only a few only had a direct AS connection; there were 3, 5, and 1 pairs kept in the ΔCue, ΔTarget, and ΔBoth conditions) that we would prefer not to report this result.

      Additionally, we now plot the two stimulus sets on the same plot (Reviewer 2 also suggested this).

      In regards to my second concern, one potential way the authors could disambiguate the effects of change in cue vs change in target might be to run a multiple linear regression with change in Cue, change in Target, and the change in Cue*change in Target interaction (potentially with random effects of subject identity and word pair identity to combine experiments and control for pair memorability/counterbalancing), which has the additional bonus of potentially allowing the authors to include all word pairs in a single model and better describe the Osgood-style spaces in Figure 6.

      This is a very interesting idea. We set this analysis up as the reviewer suggested, using fixed effects for ΔCue, ΔTarget, and ΔCue*ΔTarget, and random effects for subject and word ID. Because we had a binary outcome variable, we used mixed effects logistic regression. For a given pair, if it had the same cue or target, the corresponding change column received a 0, and if it had a different cue or target, it received a graded value (1 - GloVe value between the new and old cue or target). For this analysis, because we designed this analysis to indicate a treatment away from a repeat (as in the No Δ condition, which had no change for either cues and targets), we omitted control items. For items in the ΔBoth condition, we initially used positive values in both the Cue and Target columns too, with the multiplied ΔCue*ΔTarget value in its own column. We focused these analyses on the 48-hr delay experiments. In both experiments, running it this way resulted in highly significant negative effects of ΔCue and ΔTarget (both p < 0.001), but positive effects of ΔCue*ΔTarget (p < 0.001), presumably because after accounting for the negative independent predictions of both ΔCue and ΔTarget, ΔCue*ΔTarget values actually were better than expected.

      We thought that those results were a little strange given that generally there did not appear to be interactions with ΔCue*ΔTarget values, and the positive result was simply due to the other predictors in the model. To show that this is the case, we changed the predictors so that items in the ΔBoth condition had 0 in ΔCue and ΔTarget columns alongside their ΔCue*ΔTarget value. In this case, all three factors negatively predicted memory (all p < 0.001).

      We don't necessarily see this second approach as better, partly because it seems clear to us that any direction you go from identity is just hurting memory, and we felt the need to drop the control condition. We next flipped around the analysis to more closely resemble how we ran the other analyses, using similarity instead of distance. Here, identity along any dimension indicated a 1, a change in any part of the pair involved using that pair’s GloVe value (rather than the 1 – the GloVe value from above), and the control condition simply had zeros in all the columns. In this case, if we code the cue and target similarity values as themselves in the ΔBoth condition, in both 48-hr experiments, cue and target similarity significantly positively predicted memory (narrower set: cue similarity had p = 0.006, target similarity had p < 0.001; wider set: both p < 0.001) and the interaction term negatively predicted memory (p < 0.001 in both). If we code cue and target similarity values as 0s in the ΔBoth condition, all three factors tend to be positive (narrower, Cue: p = 0.11, Target and Interaction: p < 0.001; wider, Cue and Target p < 0.001; Interaction: p = 0.07).

      Ultimately, we would prefer to leave this out of the manuscript in the interest of simplicity and because we largely find that these analyses support our prior conclusions. However, we could include them if the reviewer prefers.

    2. eLife Assessment

      This important work advances our understanding of how memories interact to facilitate or interfere with one another, also informing our understanding of how humans build knowledge. The study provides compelling evidence that semantic relatedness proactively benefits memory using clean experimental design, rigorous statistics, large N samples, and well-characterized stimuli. The study also demonstrates the boundaries of these proactive benefits, showing that when studied items have weaker semantic relationships, proactive interference may be observed. This research will be of interest to memory researchers as well as cognitive psychologists, neuroscientists, and educators more broadly.

    3. Reviewer #1 (Public review):

      Summary:

      Bennion and colleagues present a careful examination of how an earlier set of memories can either interfere with or facilitate memories formed later. This impressive work is a companion piece to an earlier paper by Antony and colleagues (2022) in which a similar experimental design was used to examine how a later set of memories can either interfere with or facilitate memories formed earlier. This study makes contact with an experimental literature spanning 100 years, which is concerned with the nature of forgetting, and the ways in which memories for particular experiences can interact with other memories. These ideas are fundamental to modern theories of human memory, for example, paired-associates studies like this one are central to the theoretical idea that interference between memories is a much bigger contributor to forgetting than any sort of passive decay.

      Strengths:

      At the heart of the current investigation is a proposal made by Osgood in the 1940s regarding how paired associates are learned and remembered. In these experiments one learns a pair of items, A-B (cue-target), and then later learns another pair that is related in some way, either A'-B (changing the cue, delta-cue), or A-B' (changing the target, delta-target), or A'-B' (changing both, delta-both), where the prime indicates that item has been modified, and may be semantically related to the original item. The authors refer to the critical to-be-remembered pairs as base pairs. Osgood proposed that when the changed item is very different from the original item there will be interference, and when the changed item is similar to the original item there will be facilitation. Osgood proposed a graphical depiction of his theory in which performance was summarized as a surface, with one axis indicating changes to the cue item of a pair and the other indicating changes to the target item, and the surface itself necessary to visualize the consequences of changing both.

      In the decades since Osgood's proposal, there have been many studies examining slivers of the proposal, e.g., just changing targets in one experiment, just changing cues in another experiment. Because any pair of experiments use different methods, this has made it difficult to draw clear conclusions about the effects of particular manipulations.

      The current paper is a potential landmark, in that they manipulate multiple fundamental experimental characteristics using the same general experimental design. Importantly, they manipulate the semantic relatedness of the changed item to the original item, the delay between the study experience and the test, and which aspect of the pair is changed. Furthermore, they include both a positive control condition (where the exact same pair is studied twice), and a negative control condition (where a pair is only studied once, in the same phase as the critical base pairs). This allows them to determine when the prior learning exhibits an interfering effect relative to the negative control condition, and also allows them to determine how close any facilitative effects come to matching the positive control.

      The results are interpreted in terms of a set of existing theories, most prominently the memory-for-change framework, which proposes a mechanism (recursive reminding) potentially responsible for the facilitative effects examined here. One of the central results is the finding that a stronger semantic relationship between a base pair and an earlier pair has a facilitative effect on both the rate of learning of the base pair and the durability of the memory for the base pair. This is consistent with the memory-for-change framework, which proposes that this semantic relationship prompts retrieval of the earlier pair, and the two pairs are integrated into a common memory structure that contains information about which pair was studied in which phase of the experiment. When semantic relatedness is lower, they more often show interference effects, with the idea being that competition between the stored memories makes it more difficult to remember the base pair.

      This work represents a major methodological and empirical advance for our understanding of paired-associates learning, and it sets a laudably high bar for future work seeking to extend this knowledge further. By manipulating so many factors within one set of experiments, it fills a gap in the prior literature regarding the cognitive validity of an 80-year-old proposal by Osgood. The reader can see where the observed results match Osgood's theory and where they are inconclusive. This gives us insight, for example, into the necessity of including a long delay in one's experiment, to observe potential facilitative effects. This point is theoretically interesting, but it is also a boon for future methodological development, in that it establishes the experimental conditions necessary for examining one or another of these facilitation or interference effects more closely.

      The authors were exceptionally responsive to the suggestions of the reviewers, and the revisions have improved the theoretical clarity of the paper. I think the value of this work will grow with time, as memory researchers and theorists use it as a benchmark for new theory development. For example, the data from these experiments will undoubtedly be used to develop and constrain a new generation of computational models of paired-associates learning.

      Weaknesses:

      One minor weakness of the work is that the overarching theoretical framing does not necessarily specify the expected result for each and every one of the many effects examined. For example, with a narrower set of semantic associations being considered (all of which are relatively high associations) and a long delay, varying the semantic relatedness of the target item did not reliably affect the memorability of that pair. However, the same analysis showed a significant effect when the wider set of semantic associations was used. The positive result is consistent with the memory-for-change framework, but the null result isn't clearly informative to the theory. However, research is never done; comparing the results with the two sets of semantic associations is informative from a methodological perspective, in that it establishes the degree to which semantic relatedness must be altered to affect behavioral performance in a paired-associates task.

    4. Reviewer #2 (Public review):

      Summary:

      The study focuses on how relatedness with existing memories affects the formation and retention of new memories. Of core interest were the conditions that determine when prior memories facilitate new learning or interfere with it. Across a set of experiments that varied the degree of relatedness across memories as well as retention interval, the study compellingly shows that relatedness typically leads to proactive facilitation of new learning, with interference only observed under specific conditions and immediate test and being thus an exception rather than a rule.

      Strengths:

      The study uses a well-established word-pair learning paradigm to study interference and facilitation of overlapping memories. It however goes more in depth than a typical interference study in the systematic variation of several factors: (1) which elements of an association are overlapping and which are altered (change target, change cue, change both, change neither); (2) how much the changed element differs from the original (word relatedness, with two ranges of relatedness considered); (3) retention period (immediate test, 2-day delay). Furthermore, each experiment has a large N sample size, so both significant effects as well as null effects are robust and informative.

      The results show the benefits of relatedness, but also replicate interference effects in the "change target" condition when the new target is not related to the old target and when test is immediate. This provides reconciliation of some existing seemingly contradictory results on the effect of overlap on memory. Here, the whole range of conditions is mapped to convincingly show how the direction of the effect can flip across the surface of relatedness values.

      Additional strength comes from supporting analyses, such as analyses of learning data, demonstrating that relatedness leads to both better final memory and also faster initial learning.

      More broadly, the study informs our understanding of memory integration, demonstrating how interdependence of memory for related information increases with relatedness. Together with a prior study or retroactive interference and facilitation, the results provide new insights into the role of reminding in memory formation.

      In summary, this is a highly rigorous body of work that sets a great model for future studies and improves our understanding of memory organization.

      Weaknesses:

      The evidence for the proactive facilitation driven by relatedness is very convincing. However, in the finer scale results, the continuous relationship between the degree of relatedness and the degree of proactive facilitation/interference is less clear. The relationship was only found in the wider stimulus set, where some pairs were unrelated and other pairs related, and only when GloVe metric for measuring relatedness was used. The absence of a relationship between relatedness and memory in the narrow stimulus set (where all pairs were related to some degree) suggests this could be potentially an all-or-none effect (facilitation for related) rather than a matter of degree. Furthermore, a different metric of relatedness, associative strength AS, did not show the same relationship. The discrepancy between the metrics is not fully resolved. This is less of a problem with interdependence analyses where the results are more converging across narrow and wider range as well as the two metrics.

      A smaller weakness, acknowledged by the authors, is generalizability beyond the word set used here. Using a carefully crafted stimulus set and repeating the same word pairings across participants and conditions was important for memorability calculations and some of the other analyses. However, highlighting the inherently noisy item-by-item results, especially in the Osgood-style surface figures, makes it challenging to imagine how the results would generalize to new stimuli, even within the same relatedness ranges as the current stimulus sets.

    5. Reviewer #3 (Public review):

      Summary:

      Bennion et al. investigate how semantic relatedness proactively benefits the learning of new word pairs. The authors draw predictions from Osgood (1949), which posits that the degree of proactive interference (PI) and proactive facilitation (PF) of previously learned items on to-be-learned items depends on the semantic relationships between the old and new information. In the current study, participants subjects learn a set of word pairs ( "supplemental pairs"), followed by a second set of pairs ("base pairs"), in which the cue, target or both words are changed, or the pair was identical. Pairs were drawn from either a narrower or wider stimulus set and were tested after either a 5 minute or 48 hour delay. The results show that semantic relatedness overwhelmingly produces PF and greater memory interdependence between base and supplemental pairs, except in the case of unrelated pairs in a wider stimulus set after a short delay, which produced PI. In their final analyses, the authors compare their current results to previous work from their group studying the analogous retroactive effects of semantic relatedness on memory. These comparisons show generally similar, if slightly weaker, patterns of results. The authors interpret their results in the framework of recursive reminders (Hintzman, 2011), which posits that the semantic relationships between new and old word pairs promotes reminders of the old information during the learning of the new to-be-learned information. These reminders help to integrate the old and new information and result in additional retrieval practice opportunities that in turn improve later recall.

      Strengths:

      Overall, I thought that the analyses were thorough and well-thought-out and the results were incredibly well-situated in the literature, especially with the additional clarification and framing that the authors have made in response to reviewer comments. In particular, I found that the large sample size, inclusion of a wide range of semantic relatedness across the two stimulus sets, variable delays and the ability to directly compare the current results to their prior results on the retroactive effects of semantic relatedness were particular strengths of the authors' approach and make this an impressive contribution to the existing literature. I thought that their interpretations and conclusions were mostly reasonable and included appropriate caveats (where applicable).

      Weaknesses:

      The changes and additional analyses that the authors have made have addressed my concerns about their analyses. Including the additional Fig 1- Supp 1, panel C greatly helps with the interpretability across stimulus sets, and the additional analyses the authors have performed teasing apart whether cue and target similarity separately influence memorability and interdependence seem to support the rest of their conclusions.

    1. eLife assessment

      In this valuable study, the authors analyze droplet size distributions of multiple protein condensates and their fit to a scaling ansatz, highlighting that they exhibit features of first- and second-order phase transitions. The experimental evidence is solid, and it prompts further research into the nature of the link between percolation and phase separation.

    2. Reviewer #1 (Public Review):

      The authors analyse droplet size distributions of multiple protein condensates and fit to a scaling ansatz to highlight that they exhibit features of first-order and second-order phase transitions. While the experimental evidence is solid, the text lacks connection and contextualization to the well-understood expectations from the coupling of percolation and phase separation in protein condensates - a phenomenon that is increasingly gaining consensus amongst the community. The evidence supports the percolatoin+phase separation model rather than being close to a true critical point in the liquid-gas phase space. Overall, the work is useful to the community.

      Strengths:<br /> The experimental analysis of distinct protein condensates is very well done and the reported exponents/scaling framework provides a clear framework to help the community help deconvolve signatures of percolation in condensates.

      Weaknesses:

      The principal concern this reviewer has is that the reviewers adopt a framing in this paper to present a discovery of second-order features and connections to criticality - however they ignore/miss the connections to percolation (a well-understood second-order transition that is expected to play a major role in protein condensates). I believe this needs to be addressed and the paper suitably revised to help connect with these expectations.

      - Protein condensates have been increasingly understood to be described as fluids whose assembly is driven by a connection of density (phase separation, first-order) and connectivity (percolation, second-order) transitions. This has been long known in the polymer community (Flory, Stockmayer, Tanaka, Rubinstein, Semenov and others) and recently repopularized in the condensate community (by Pappu and Mittag, in particular, amongst others). The authors make no connections to any of this frameworks - which actually seem to be the essence of what they are describing.

      - Percolation theory, which has been around for more than half-a-century, has clear-cut scaling laws that have essentially similar forms to the ansatz adopted by the authors and the commonalities/differences are not discussed by the authors - this is essential since this provides a physical basis for their ansatz rather than an arbitrary mathematical formulation. In particular, percolation models connect size distribution exponents to factors like dimensionality, valence, etc. and if these connections can be made with this data, that would be very powerful.

      - The connections between spinodal decomposition and second-order phase transitions are very confusing. Spindal decomposition happens when the barriers for first-order phase transitions are zero and systems can phase separate without crossing nucleation barriers. Further, the "criticality" discussed in the paper is confusing since it more likely refers to a percolation threshold and much less likely to a "critical temperature" (Tc -where spinodal and binodals become identical). I would recommend reframing this argument.

      It's unlikely, in this reviewer's opinion, that the authors are actually discussing a "first-order" liquid-gas critical point - because saturation concentrations of these proteins can be much higher with temperature and the critical point would thus likely be at much higher concentrations (and ofc temperature). Further the scaling exponents don't fall in that class naturally. However, if the authors disagree, I would appreciate clear quantitative reasons (including through the scaling exponents in that universality class) and be happy to be convinced to change my mind. As provided, the data does not support this model.

    3. Reviewer #2 (Public Review):

      In response to the two referee reports, the authors have made substantial improvements. Regarding my previous concerns, the new data provided in Fig.6 for demonstrating that the droplet size distribution is stable over time is particularly valuable.

      As to several of my other previous concerns regarding possible change in droplet size distribution over time, etc., the authors responded by stating that their system was below the critical concentration and therefore the possible scenarios pointed out in my previous report were not expected. While there may be a certain degree of validity to their argument, it would be much more helpful to the readers if the authors would bring up my previous concerns briefly (as readers of the journal will likely have similar concerns) and then address them succinctly within the manuscript.

      Apparently, as a key element in the authors' response to the referees, the term "transition concentration" in the originally submitted manuscript is now changed to "critical concentration" (including in the title and abstract). But the two terms do not have identical meaning. A transition concentration is usually recognized as the saturation concentration at which phase separation or some other transition process commences at a given temperature. The transition concentration can be lower than the critical concentration, whereas the critical concentration is associated with the critical temperature, above (or below, depending on the temperature dependence of phase separation) which phase separation is not possible. It will be best if the authors can clarify their usage of transition concentration vs. critical concentration in the version of record of their manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews

      eLife assessment:

      In this useful study, the authors analyze droplet size distributions of multiple protein condensates and their fit to a scaling ansatz, highlighting that they exhibit features of first- and second-order phase transitions. The experimental evidence is still incomplete as the measurements were apparently done only at one time point, neglecting the possibility that droplet size distribution can evolve with time. The text would benefit from a connection to and contextualization with the well-understood expectations from the coupling of percolation and phase separation in protein condensates - a phenomenon that is increasingly gaining consensus amongst the community and that emphasizes "liquid-gas" criticality. 

      We have now carried out new experiments at multiple time points to establish that the droplet size distributions are stationary below the critical concentration. We have also addressed the comments made by the reviewers about the nature of the phase transition.

      Our analysis does not depend on a specific hypothesis on the nature of the phase transition, whether it be percolation or a gas-liquid critical transition. The scaling that we observed is an emergent property that is independent from the possible theoretical models used to describe the phase transition. In fact, our scaling analysis indicates that any theoretical model proposed for protein phase separation should predict the critical exponents that we reported. 

      Reviewer #1

      The authors analyse droplet size distributions of multiple protein condensates and fit to a scaling ansatz to highlight that they exhibit features of first-order and second-order phase transitions. While the experimental evidence is solid, the text lacks connection and contextualization to the well-understood expectations from the coupling of percolation and phase separation in protein condensates - a phenomenon that is increasingly gaining consensus amongst the community. The evidence supports the percolation and phase separation model rather than being close to a true critical point in the liquid-gas phase space. Overall, the work is useful to the community.

      We are grateful to the reviewer for these positive comments. We would like to emphasises that our contribution is not to propose a theoretical model, but rather to report a scaling behaviour in the experimentally measured droplet size distributions. The main implication of our work is that any theoretical model should predict the scaling exponents that we derived from the experimental measurements.

      Strengths: 

      The experimental analysis of distinct protein condensates is very well done and the reported exponents/scaling framework provides a clear framework to help the community deconvolve signatures of percolation in condensates. 

      Weaknesses: 

      The principal concern this reviewer has is that the reviewers adopt a framing in this paper to present a discovery of second-order features and connections to criticality - however, they ignore/miss the connections to percolation (a well-understood second-order transition that is expected to play a major role in protein condensates). I believe this needs to be addressed and the paper suitably revised to help connect with these expectations. 

      The scaling that we found is not characteristic standard percolation, since the exponents that we obtained (a=0 and f=1) are different from those of percolation (a=1.19 and f=2.21). This difference indicates that protein phase separation is not in the same universality class of standard percolation. Further studies will be required to understand whether theoretical models based on percolation could predict the observed critical exponents.

      - Protein condensates have been increasingly understood to be described as fluids whose assembly is driven by a connection of density (phase separation, first-order) and connectivity (percolation, second-order) transitions. This has been long known in the polymer community (Flory, Stockmayer, Tanaka, Rubinstein, Semenov, and others) and recently repopularized in the condensate community (by Pappu and Mittag, in particular, amongst others). The authors make no connections to any of these frameworks - which actually seem to be the essence of what they are describing. 

      As mentioned above, our purpose was neither to support an existing theoretical model, nor to propose a new one. Rather, we have reported a scaling behaviour and scaling exponents not noted before. Further studies will be required to establish whether existing theoretical models could account for this scaling behaviour.

      - Percolation theory, which has been around for more than half a century, has clear-cut scaling laws that have essentially similar forms to the ansatz adopted by the authors, and the commonalities/differences are not discussed by the authors - this is essential since this provides a physical basis for their ansatz rather than an arbitrary mathematical formulation. In particular, percolation models connect size distribution exponents to factors like dimensionality, valence, etc. and if these connections can be made with this data, that would be very powerful. 

      The scaling ansatz that we are using is commonly adopted in studies of critical phenomena, and it is not specific to percolation. The scaling exponents depends only on very few attributes like dimensionality, symmetries and if interactions are short or long range. These attributes determine the universality class. As such, scaling does not link with molecular determinants, but can distinguish different classes.

      - The connections between spinodal decomposition and second-order phase transitions are very confusing. Spindal decomposition happens when the barriers for first-order phase transitions are zero and systems can phase separate without crossing nucleation barriers. Further, the "criticality" discussed in the paper is confusing since it more likely refers to a percolation threshold and much less likely to a "critical temperature" (Tc -where spinodal and binodals become identical). I would recommend reframing this argument. 

      We cannot refer to percolation threshold as our model is not readily compatible with it. We elaborated and better explained the differences between these models.

      It's unlikely, in this reviewer's opinion, that the authors are actually discussing a "first-order" liquid-gas critical point - because saturation concentrations of these proteins can be much higher with temperature and the critical point would thus likely be at much higher concentrations (and ofc temperature). Further, the scaling exponents don't fall into that class naturally. However, if the authors disagree, I would appreciate clear quantitative reasons (including through the scaling exponents in that universality class) and be happy to be convinced to change my mind. As provided, the data does not support this model. 

      We have now clarified in the manuscript that we do not discuss the liquid-gas critical point.

      Reviewer #2

      This is a potentially interesting study addressing a possible scale-invariant log-normal characteristic of droplet size distribution in the phase separation behavior of biomolecular condensates. Some of the data presented are valuable and intriguing. However, as it stands, the validity and utility of this study are uncertain because there are serious deficiencies in the execution and presentation of the authors' results. Many of these shortcomings are fundamental, including a lack of clarity in the basic conceptual framework of the study, insufficient justification of the experimental setup, less-than-conclusive experimental evidence, and inadequate discussion of implications of the authors' findings to future experimental and theoretical studies of biomolecular condensates. Accordingly, this reviewer considers that the manuscript should undergo a major revision to address the following. In particular, the discussion should be significantly expanded by including references mentioned below as well as other references pertinent to the issues raised. 

      We thank the reviewer for the helpful comments. In the revised version of the manuscript we clarified that we aimed to use a well-established tool – the scaling analysis – to study phase transition and applied to the protein condensation process. This approach offers insight into a universal aspect of protein phase separation, and also provides a practical approach to determine the phase boundary. The observed fat-tailed distribution of protein droplet sizes is not what is normally observed in more standard phase separation systems in the subsaturated phase. Our contribution is not to propose a theoretical model, but rather to report the observation of a scaling behaviour. 

      (1) The theoretical analysis in this study is based on experimental data on condensed droplet size distributions for FUS and α-synuclein. The size data for FUS droplet is indirect as it relies on the assumption that FUS droplet diameter is proportional to fluorescence intensity of labeled FUS (page 10 of manuscript), with fluorescence data adopted from a previously published work by another group (Kar et al. & Pappu, ref.27). Because fluorescence of a droplet is expected to be dependent upon the condensed-phase concentration of FUS, this proportional relationship, even if it holds, must also be modulated by FUS concentration in the droplet. Moreover, why should fluorescence be proportional to diameter but not the cross-sectional area or volume of the FUS droplet, which would be more intuitive? These issues should be clarified. A new measure by microscopy is used to determine the size distribution of condensed α-synuclein; but no microscopy image is shown. It is of critical importance that such raw data (for example microscopy images) be presented for the completeness and reproducibility of the experiment because the entire study relies on the soundness of these experimental measurements. 

      As we mentioned in the article, for the scaling analysis, the droplet dimensions could be assessed in 1D (length), 2D (area) or 3D (volume). For the FUS experiments, we used the data as the authors provided in the original publication (PNAS 2022). For alpha-synuclein, we provided the data in the article. 

      (2) Despite the authors' claim of a universal scaling relationship, the log-log scatter plots in Figure 1 (page 15 of the manuscript) exhibit significant deviations from linearity at low protein concentrations (ρ→0). Given this fact, is universal scaling really valid? Discussion of this behavior is conspicuously absent (except the statement that these data points are excluded in the fit). In any case, the possible origins of these deviations should be thoroughly discussed so that the regime of universal scaling can be properly delineated. 

      In general, one would expect the scaling ansatz to be valid close to the phase boundary. It is the feature of the ansatz, that further away from the boundary, deviations are expected because of the decreasing relevance of critical phenomena.

      (3) Droplet size distribution most likely depends on the time duration after the preparation of the sample. For α-synuclein, "liquid droplet size characterisation images were captured 10 minutes post-liquid droplet formation" (page 9 of the manuscript). Why 10 minutes? Have the authors tried imaging at different time points and, if so, do the distributions at different time points remain essentially the same? If they are different, what is the criterion for focusing only on a particular time point? Information related to these questions should be provided. 

      We have now determined the droplet size distribution of alpha-synuclein at different time points, finding that they are not dependent on time within experimental uncertainties (Figure 6 in the revised manuscript).

      (4) At least two well-known mechanisms can lead to the time-dependent distribution of liquid droplet sizes: (i) coalescence of droplets in spatial proximity to form a larger droplet, and (ii) Ostwald ripening, i.e., formation of larger droplets concomitant with the dissolution of smaller droplets without fusion of droplets. The implications of these mechanisms on the authors' droplet size distributions should be addressed. Indeed, maintaining a size distribution against these mechanisms in vivo often requires active suppression [Bressloff, Phys Rev E 101, 042804 (2020)] with possible involvement of chemical reactions [Kirschbaum & Zwicker, J R Soc Interface 18, 20210255 (2021)]. These considerations are central to the basic rationale of this study and therefore should be carefully tackled. 

      These two mechanism of growth are relevant above the critical concentration. Below the critical concentration, which is the regime that we investigated in our work, there is no need of active suppression.

      (5) If coalescence and/or Ostwald ripening do occur, given sufficient time after sample preparation, the condensed phase may become a single large "droplet" or a single liquid layer. Does this occur in the authors' experiments? 

      As we are below the critical concentration, this is unlikely to occur, as indeed supported by the experiments mentioned at point (3). 

      (6) It is unclear whether the authors aim to address the kinetic phenomenon of liquid droplet formation and evolution or equilibrium properties. The two types of phenomena appear to be conflated in the authors' narrative. Clarification is needed. If this work aims to address timeindependent (or infinite-time) equilibrium properties, how are they expected to be related to droplet size distribution, which most likely is time-dependent? 

      Our analysis focuses on the equilibrium properties of the droplet size distribution below the critical concentration, and it should guide the proposal of a theoretical model that explains the emergence of scaling. In the introductory part of our manuscript, we proposed a possible scenario that tries to extend the Flory-Huggins’s theory to predict a scaling behaviour appropriate to a critical transition. Other scenarios are possible, and our result along with further experiments are needed to arrive at a deeper understanding of protein aggregation.

      (7) The relationship between the potentially time-dependent droplet size distribution and equilibrium properties of ρt and ρc (transition and critical concentrations, respectively) should be better spelled out. An added illustrative figure will be helpful. 

      We are addressing equilibrium properties, not kinetic ones. See also the answers to point 6.

      (8) The authors comment that their findings appear to be inconsistent with Flory-Huggins theory because Flory-Huggins "characterizes droplet formation as a consequence of nucleation ..." (page 8 of the manuscript). Here, three issues need detailed clarification: (i) In what way does Flory-Huggins mandate nucleation? (ii) Why are the findings of apparent scale invariance inconsistent with nucleation? (iii) If liquid droplet formations do not arise from nucleation, what physical mechanism(s) is (are) envisioned by the authors to be underpinning the formation of condensed liquid droplets in protein phase separation? 

      We do agree that the Flory-Huggins theory does not mandate nucleation above the spinodal line. However, we are addressing the equilibrium properties below the critical concentration, so the stable phase is the dilute phase, and there is no nucleation.

      (9) Are any of the authors' findings related to finite-system effects of phase separation [see, e.g., Nilsson & Irbäck, Phys Rev E 101, 022413 (2020)]?  

      Our experimental system is macroscopic, so we would not expect finite size effects.

      (10) Since the authors are using their observation of an apparent scale-invariant droplet size distribution to evaluate phase separation theory, it is important to clarify whether their findings provide any constraint on the shape of coexistence curves (phase diagrams). 

      We are only reporting the phenomenological observation of a scaling behaviour, so we may not speculate at this stage on the constraints of the coexistence curves. This is indeed an exciting opportunity for future studies.

      (11) More specifically, do the authors' findings suggest that the phase diagrams predicted by Flory-Huggins are invalid? Or, are they suggesting that even if the phase diagrams predicted by Flory-Huggins are empirically correct (if verified by experimental testing), they are underpinned by a free energy function different from that of Flory-Huggins? It is important to answer this question to clarify the implications of the authors' findings on equilibrium phase behaviors and the falsifiability of the implications. 

      As mentioned above, our main conclusion is that the droplet size distribution follows a scaling behaviour.  Our contribution is not to propose a theoretical model, but rather to propose a scaling behaviour that should be accounted for by existing of future theoretical models.

      (12) How about the implications of the authors' findings on other theories of protein phase separation that are based on interactions that are different from the short spatial range interactions treated by Flory-Huggins? For instance, it has been observed that whereas the Flory-Huggins-predicted phase diagrams always convex upward, phase diagrams for charged intrinsically disordered proteins with long spatial range Coulomb interactions exhibit a region that concave upward [Das et al., Phys Chem Chem Phys 20, 28558-28574 (2018)]. Can information be provided by the authors' findings regarding apparent scale-invariant droplet size distribution on the underlying interaction driving the protein molecules toward phase separation? 

      This is an interesting point for future studies about the type of interactions that give rise to the observed scaling behaviour.

      (13) Table S1 (page 4) and Table S2 (page 7) are mentioned in the text but these tables are not in the submitted files. 

      We have added the Supplementary Tables as well as the source files for the figures.

      (14) The two systems studied (FUS and α-synuclein) have a single intrinsically disordered protein (IDP) component. It is not clear if the authors expect their claimed scaling relation to be applicable to systems with multiple IDP components and if so, why.

      From the data that we have currently analysed, we feel that we may not speculate on this interesting point, leaving it to future studies.

    1. eLife Assessment

      This study presents a valuable finding about the role of Perk (Protein kinase RNA-like endoplasmic reticulum kinase) and Atf4 (Activating Transcription Factor-4) in the integrated neurodegenerative and regenerative responses following the optic nerve injury. The authors present solid evidence, combining newly generated transcriptomic data with publicly available datasets to strengthen their findings. Despite some limitations in data quality and interpretation, the study is likely to be of interest to researchers studying optic neuropathies and axonal regeneration.

    2. Reviewer #1 (Public review):

      Somasundaram and colleagues explore the role of transcription factors in retinal ganglion cell (RGC) death and axonal regeneration after a disease relevant insult (mechanical axonal injury). The work significantly extends our knowledge of the role of MAPK and integrated stress response (ISR) in controlling RGC fate after injury. Specifically, the manuscript shows that after axonal injury PERK-activated ISR acts through Atf4 to drive a prodeath transcriptional response in RGCs, in part by crosstalk with the prodeath JUN transcriptional program. Also, and perhaps most interesting, the work shows that PERK-ATF4 pathway activation is pro-regenerative for RGC axons. A major plus of the manuscript is that many new RNA-seq datasets are generated that describe the major prodegenerative and proregenerative gene networks altered after axonal injury. A limitation of the study is that it does not directly compare the effect of inhibiting the PERK-ATF4 pathway with inhibiting JUN and/or JUN-CHOP double deficient animals. It would also be useful, for the cell survival experiments shown in Figure 1, to examine a longer time point than 14 days to understand the long-term consequence of manipulating the PERK-ATF4 pathway.

    3. Reviewer #2 (Public review):

      This manuscript investigates the role of Perk (Protein kinase RNA-like endoplasmic reticulum kinase) and Atf4 (Activating Transcription Factor-4) in neurodegenerative and regenerative responses following optic nerve injury. The authors employed conditional knockout mice to examine the impact of the Perk/Atf4 pathway on transcriptional responses, with a particular focus on canonical Atf4 target genes and the involvement of C/ebp homologous protein (Chop).

      The study demonstrates that Perk primarily operates through Atf4 to stimulate both pro-apoptotic and pro-regenerative responses after optic nerve injury. This Perk/Atf4-dependent response encompasses canonical Atf4 target genes and limited contributions from Chop, exhibiting overlap with c-Jun-dependent transcription. Consequently, the Perk/Atf4 pathway appears crucial for coordinating neurodegenerative and regenerative responses to central nervous system (CNS) axon injury. Additionally, the authors observed that neuronal knockout of Atf4 mimics the neuroprotection resulting from Perk deficiency. Moreover, Perk or Atf4 knockout hinders optic axon regeneration facilitated by the deletion of the tumor suppressor Pten.

      These findings contrast with the transcriptional and functional outcomes reported for CRISPR targeting of Atf4 or Chop, revealing a vital role for the Perk/Atf4 pathway in orchestrating neurodegenerative and regenerative responses to CNS axon injury.

      However, the main concern is the overall data quality, which appears to be suboptimal. The transfection efficiency of AAV2-hSyn1-mTagBFP2-ires-Cre used in this study does not seem highly effective, as evidenced by the data presented in Supplementary Figure 1. The manuscript also contains several inconsistencies and a mix of methods in data collection, analysis, and interpretation, such as the labeling and quantification of RGCs and the combination of bulk and single-cell sequencing results.

      Despite these limitations, the study offers valuable insights into the role of the Perk/Atf4 pathway in determining neuronal fate after axon injury, emphasizing the significance of understanding the molecular mechanisms that govern neuronal survival and regeneration. This knowledge could potentially inform the development of targeted therapies to promote neuroprotection and CNS repair following injury.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      A limitation of the study is that it does not directly compare the e4ect of inhibiting the PERKATF4 pathway with inhibiting JUN and/or JUN-CHOP double deficient animals. It would also be useful, for the cell survival experiments shown in Figure 1, to examine a longer time point than 14 days to understand the long-term consequence of manipulating the PERK-ATF4 pathway.

      We appreciate that both suggestions are fantastic ideas for future studies but consider them to be beyond the scope of this investigation. 

      Reviewer #2 (Public Review):

      However, the main concern is the overall data quality, which appears to be suboptimal. The transfection e4iciency of AAV2-hSyn1-mTagBFP2-ires-Cre used in this study does not seem highly e4ective, as evidenced by the data presented in Supplementary Figure 1.

      We appreciate the importance of the e;ectiveness of transfection e;iciency of AAV2-hSyn1-mTagBFP2-ires-Cre to the interpretations of our results and acknowledge that the imaging and color schemes used required improvement. We have now validated widespread knockout in RGCs using AAV2-hSyn1-mTagBFP2-ires-Cre, improving the staining and imaging of LSL.tdTomato Cre reporter mice (Figure S1A-B) and using RNAScope to validate the disruption of ATF4 and CHOP, respectively, in the RGCs of ATF4 cKO and CHOP cKO mice (Figure S1C-D). Additional validation of functional knockout of these transcription factors is provided by reduction of RGC-autonomous expression of transcripts that we identified in this study to be injury-regulated in an ATF4-dependent (Chac1, Atf3, Figure 4C-E) or ATF4- and CHOP-dependent manner (Ecel1, Avil, Figure 4C-E and Figure S2D).

      The manuscript also contains several inconsistencies and a mix of methods in data collection, analysis, and interpretation, such as the labeling and quantification of RGCs and the combination of bulk and single-cell sequencing results.

      Regarding the use and comparison of bulk-seq and scRNA-seq data, it is our sense that these innovative approaches will be among the impactful aspects of this study. Numerous transcriptomic studies of the optic nerve crush model exist, though it has been unclear whether major and minor technical di;erences would preclude deriving insights across studies without the expense and time of exact reproduction. One goal of this study was to evaluate the hypothesis that, despite the obvious limitation that RGCs represent fewer than 1% of cells in whole retina bulk transcriptomics approach, the signals amongst top di;erentially expressed genes (DEGs) would be dominated by injury-induced changes within RGCs and that the most robust of these changes would be readily detected across techniques and labs, serving as a cornerstone for interpreting similarities and di;erences in findings. We believe that the results validate this approach. Important insights gained in this study from these cross-study and cross-platform analyses include:

      (1) Genes that we identify in this study as neuronal ATF4-dependent by whole retina transcriptomics include many of the most robust genes expression changes observed across multiple studies that enrich for RGCs and those that only report RGC-autonomous expression changes by scRNA-seq. This observation predicts that many of the ATF4-dependent expression changes that we report are RGC autonomous, which we further validate in this revision by RNAScope.

      (2) Similarly designed whole transcriptomics studies across labs can be remarkably robust for top DEGs, showing striking similarity that allows for meaningful insights and testable hypotheses across di;erent knockout and conditional knockout mice.

      (3) scRNA-seq of RGCs and bulk sequencing of FACS-enriched RGCs, unsurprisingly results in higher sensitivity for injury-induced expression changes, but the high degree of similarity that we demonstrate between the top DEGs from those studies and whole retina transcriptomics studies allows for confident inferences regarding the expected cell autonomy of reported expression changes in this model, using available resources such as the Single Cell Portal, without the expense and technical optimization required for extensive spatial transcriptomics across numerous mouse models.

      Other revisions

      In addition to these updates to address the public reviews, we are grateful for the reviewers’ additional recommendations and provide these further revisions:

      (1) We appreciate the request to clarify with a schematic the di;erences between our study and a previous report (Tian et al., 2022). A second Correction to that study was published in July 2024, resulting in changes to the logFC values used in our original cross-study comparison and adjustments to multiple figures and tables related to the proposed transcriptional programs of ATF4, CHOP, and the other purported core transcription factors. We have therefore updated our Figure S3A-C in accordance with that Correction to better reflect the underlying data of that study. These changes do not alter our original conclusions that: (a) both the whole retina transcriptomics approach of our study and the FACS-enriched RGC approach of that study readily detect the strong upregulation of many known ATF4 target genes after optic nerve crush (Figure S3A); and (b) there are striking di;erences in the ATF4- and CHOP-dependent transcripts suggested by our cKO data and those suggested by the reported gRNA data. Though we had hoped that the Correction would allow us in this revision to diagram those findings and model for comparison to these cKO findings, documenting those changes and their impacts on the proposed model is beyond the scope of this study.

      (2) We agree that the discordance between the gene and protein names for Ddit3/CHOP and Eif2ak3/PERK represents a challenge for clarity, even when gene names are carefully selected when referring to genes or transcripts and protein names when referring to proteins. We have therefore attempted to streamline the naming throughout, using where possible both names.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) I was surprised to see that the Authors have failed to address my major concerns about the paper, which was in the Main text of the Review.

      Previously I wrote: The major weakness of the manuscript is that it is written for a very specialized reader who has a strong background in cerebellar development, making it hard to read for eLife's general audience. It's challenging to follow the logic of some of the experiments as well as to contextualize these findings in the field of cerebellar development.

      This has not been addressed. The manuscript has not been substantively changed and it is still written for a very specialized reader rather than a general reader.

      We appreciate the respected reviewer’s concern and have made substantial revisions throughout the manuscript to address the points. We have simplified the technical language throughout the manuscript and included additional background information, particularly in the introduction and discussion sections, to better orient general readers. Additionally, we have clarified the logical flow of the experiments by incorporating transitional statements and summaries that explain the purpose and outcomes of each experiment (revisions are highlighted in yellow). 

      (2) These two have been addressed, although to be honest, I don't think that the cartoon is particularly helpful for a general audience.

      Thank you for your feedback. We have replaced the cartoon with a revised version that provides more detailed information to clarify and simplify the origins of cerebellar nuclei from the caudal and rostral ends in both Atoh1+/+ and Atoh1-/- mice. We believe this will make the content more clear and informative for the general audience.

      (3) My third recommendation, that they include a section in the Discussion to speculate about what these cells may become in the adult and the existence of multiple cell types with different molecular markers and projection patterns in the nuclei, has also not been addressed.

      We apologize for the oversight in the previous revision. We have now added a detailed discussion in the manuscript that speculates on the potential fate of these newly identified cells in the adult cerebellum, suggesting that they may differentiate into excitatory neurons (highlighted on page 9). In addition, as noted in our previous resubmission, further direct evidence is needed from the early population of SNCA+ cells during E9 to E13. This is an ongoing focus of investigation in our lab, where we are currently using SNCA-GFP mice, part of a project for a PhD student in our lab.

      Reviewer #2 (Recommendations For The Authors):

      One small remaining issue: The methods text re cell counts remains confusing: n=3

      EMBRYOS???

      "To assess the number of OTX2-positive cells, we conducted immunohistochemistry (IHC) labeling on slides containing serial sections from embryonic days 12, 13, 14, and 15 (n=3 EMBRYOS??? at each timepoint)."

      Thank you for this point and we acknowledge that, and we have revised the text in the methods section for clarity. As highlighted on page 11, “The sample size was equal to 9 embryos” and on page 16, “3 embryos were used at each time point”.

    2. eLife Assessment

      The authors are interested in the developmental origin of the neurons of the cerebellar nuclei. In this study, they identify a population of neurons with a specific complement of markers that originate in a distinct location from where cerebellar nuclear precursor cells have been thought to originate that show distinct developmental properties. The discovery of a new germinal zone giving rise to a new population of neurons is an exciting finding, and it enriches our understanding of cerebellar development. The important claims, better explained in the current version, are well supported by solid evidence with the authors using a wide range of technical approaches, including transgenic mice that allow them to disentangle the influence of distinct developmental organizers

    3. Reviewer #1 (Public Review):

      Summary:

      The authors are interested in the developmental origin of the neurons of the cerebellar nuclei. They identify a population of neurons with a specific complement of markers originating in a distinct location from where cerebellar nuclear precursor cells have been thought to originate that show distinct developmental properties. The cerebellar nuclei have been well studied in recent years to understand their development through an evolutionary lens, which supports the importance of this study. The discovery of a new germinal zone giving rise to a new population of CN neurons is an exciting finding, and it enriches our understanding of cerebellar development, which has previously been quite straightforward, where cerebellar inhibitory cells arise from the ventricular zone and the excitatory cells arise from the rhombic lip.

      Strengths:

      One of the strengths of the manuscript is that the authors use a wide range of technical approaches, including transgenic mice that allow them to disentangle the influence of distinct developmental organizers such at ATOH.<br /> Their finding of a novel germinal zone and a novel population of CN neurons is important for developmental neuroscientists, cerebellar neuroscientists.

      Weaknesses:

      One important question raised by this work is what do these newly identified cells eventually become in the adult cerebellum. Are they excitatory or inhibitory? Do they correspond to a novel cell type or perhaps one of the cell classes that have been recently identified in the cerebellum (e.g. Fujita et al., eLife, 2020)? Understanding this would significantly bolster the impact of this manuscript.

      The major weakness of the manuscript is that it is written for a very specialized reader who has a strong background in cerebellar development, making it hard to read for eLife's general audience. It's challenging to follow the logic of some of the experiments as well as to contextualize these findings in the field of cerebellar development.

    4. Reviewer #2 (Public Review):

      Summary:

      Canonically cerebellar neurons are derived from 2 primary germinal zones within the anterior hindbrain (dorsal rhombomere 1). This manuscript identifies an important, previously underappreciated origin for a subset of early cerebellar nuclei neurons - likely the mesencephalon. This is an exciting finding.

      Strengths:

      The authors have identified a novel early population of cerebellar neurons with likely novel origin in the midbrain. They have used multiple assays to support their conclusions, including immunohistochemistry and in situ analyses of a number of markers of this population which appear to stream from the midbrain into the dorsal anterior cerebellar anlage.

      The inclusion of Otx2-GFP short term lineage analyses and analysis of Atoh1 -/- animals also provide considerable support for the midbrain origin of these neurons as streams of cells seem to emanate from the midbrain. However, without live imaging there remains the possibility that these streams of cells are not actually migrating and rather, gene expression is changing in static cells. Hence the authors have conducted midbrain diI labelling experiments of short term and long term cultured embryos showing di-labelled cells in the developing cerebellum. These studies confirm migration of cells from the midbrain into the early cerebellum.

      The authors have appropriately responded to review issues, replacing panels in figures and updating legends and text. They have also appropriately noted the limitations of their work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Fats and lipids serve many important roles in cancers, including serving as important fuels for energy metabolism in cancer cells by being oxidized in the mitochondria. The process of fatty acid oxidation is initiated by the enzyme carnitine palmitoyltransferase 1A (CPT1A), and the function and targetability of CPT1A in cancer metabolism and biology have been heavily investigated. This includes studies that have found important roles for CPT1A in colorectal cancer growth and metastasis.

      In this study, Chen and colleagues use analysis of patient samples and functional interrogation in animal models to examine the role CPT1A plays in colorectal cancer (CRC). The authors find that CPT1A expression is decreased in CRC compared to paired healthy tissue and that lower expression correlates with decreased patient survival over time, suggesting that CPT1A may suppress tumor progression. To functionally interrogate this hypothesis, the authors both use CRISPR to knockout CPT1A in a CRC cell line that expresses CPT1A and overexpress CPT1A in a CRC cell line with low expression. In both systems, increased CPT1A expression decreased cell survival and DNA repair in response to radiation in culture. Further, in xenograft models, CPT1A decreased tumor growth basally and radiotherapy could further decrease tumor growth in CPT1A-expressing tumors. As CRC is often treated with radiotherapy, the authors argue this radiosensitization driven by CPT1A could explain why CPT1A expression correlates with increased patient survival.

      Lastly, Chen and colleagues sought to understand why CPT1A suppresses CRC tumor growth and sensitizes the tumors to radiotherapy in culture. The antioxidant capacity of cells can increase cell survival, so the authors examine antioxidant gene expression and levels in CPT1A-expressing and non-expressing cells. CPT1A expression suppresses the expression of antioxidant metabolism genes and lowers levels of antioxidants. Antioxidant metabolism genes can be regulated by the FOXM1 transcription factor, and the authors find that CPT1A expression regulates FOXM1 levels and that antioxidant gene expression can be partially rescued in CPT1A-expressing CRC cells. This leads the authors to propose the following model: CPT1A expression downregulates FOXM1 (via some yet undescribed mechanism) which then leads to decreased antioxidant capacity in CRC cells, thus suppressing tumor progression and increasing radiosensitivity. This is an interesting model that could explain the suppression of CPT1A expression in CRC, but key tenets of the model are untested and speculative.

      Strengths:

      Analysis of CPT1A in paired CRC tumors and non-tumor tissue using multiple modalities combined with analysis of independent datasets rigorously show that CPT1A is downregulated in CRC tumors at the RNA and protein level.

      The authors use paired cell line model systems where CPT1A is both knocked out and overexpressed in cell lines that endogenously express or repress CPT1A respectively. These complementary model systems increase the rigor of the study.

      The finding that a metabolic enzyme generally thought to support tumor energetics actually is a tumor suppressor in some settings is theoretically quite interesting.

      We would like to thank Reviewer #1 for the positive comments.

      Weaknesses:

      The authors propose that CPT1A expression modulates antioxidant capacity in cells by suppressing FOXM1 and that this pathway alters CRC growth and radiotherapy response. However, key aspects of this model are not tested. The authors do not show that FOXM1 contributes to the regulation of antioxidant levels in CRC cells and tumors or if FOXM1 suppression is key to the inhibition of CRC tumor growth and radiosensitization by CPT1A. Thus, the model the authors propose is speculative and not supported by the existing data.

      We thank the reviewer for the valuable comment. In this study, we employed Western blotting to assess the protein levels of the ROS scavenging enzymes CAT, SOD1, and SOD2 following FOXM1 overexpression. This approach allowed us to evaluate how FOXM1 regulates ROS clearance and mediates cellular radiation resistance. Further in-vivo evidence is needed and will be addressed in future research.

      The authors propose two mechanisms by which CPT1A expression triggers radiosensitization: decreasing DNA repair capacity (Figure 3) and decreasing antioxidant capacity (Figure 5). However, while CPT1A expression does alter these capacities in CRC cells, neither is functionally tested to determine if altered DNA repair or antioxidant capacity (or both) are the reason why CRC cells are more sensitive to radiotherapy or are delayed in causing tumors in vivo. Thus, this aspect of the proposed model is also speculative.

      We thank the reviewer for the valuable comment. In this study, we combined a colony formation assay, multi-target single-hit survival model, comet assay, and Western blotting (for γH2AX) to evaluate DNA damage and repair in cells. Additionally, we employed qPCR, Western blotting, and enzyme activity kits to assess the direct ROS-scavenging activities of the peroxisomal enzymes CAT, SOD1, SOD2, and SOD3.

      The authors find that CPT1A affects radiosensitization in cell culture and assess this in vivo. In vivo, CPT1A expression slows tumor growth even in the absence of radiotherapy, and radiotherapy only proportionally decreases tumor growth to the same extent as it does in CPT1A non-expressing CRC tumors. The authors propose from this data that CPT1A expression also sensitizes tumors to radiotherapy in vivo. However, it is unclear whether CPT1A expression causes radiosensitization in vivo or if CPT1A expression acts as an independent tumor suppressor to which radiotherapy has an additive effect. Additional experiments would be necessary to differentiate between these possibilities.

      We thank the reviewer for the valuable comment. As shown in Figure 4D, in the absence of CPT1A knockdown, radiotherapy reduced the percentage of Ki67-positive cells in the xenograft tumors by 32.9% (approximately 39.6% of the pre-irradiation baseline). In contrast, upon CPT1A knockdown, radiotherapy only led to a 14.5% reduction in the percentage of Ki67-positive cells (approximately 15.6% of the pre-irradiation baseline). Furthermore, as illustrated in Figures 4E and 4F, in the absence of CPT1A overexpression, radiotherapy resulted in a 0.10-g decrease in tumor weight (around 52.5% of the pre-irradiation weight), whereas with CPT1A overexpression, radiotherapy induced a more pronounced 0.12-g reduction in tumor weight (approximately 89.7% of the pre-irradiation weight). Collectively, these findings indicate that CPT1A exhibits a radiosensitising effect. We have incorporated these relevant details in the Results section (Lines 196-201 and 204-208).

      The authors propose in Figure 3 that DNA repair capacity is inhibited in CRC cells by CPT1A expression. However, the gH2AX immunoblots performed in Figure 3H-I that measure DNA repair kinetics are not convincing that CPT1A expression impairs DNA repair kinetics. Separate blots are shown for CPT1A expressing and non-expressing cell lines, not allowing for rigorous comparison of gH2AX levels and resolution as CPT1A expression is modulated.

      We thank the reviewer for the valuable comment. In this study, we also employed a colony formation assay, multi-target single-hit survival model, and comet assay to elucidate the impact of CPT1A on DNA repair capacity. These methods all indicated that DNA repair capacity is inhibited in CRC cells by CPT1A expression.

      There are conflicting studies (PMID: 37977042, 29995871) that suggest that CPT1A is overexpressed in CRC and contributes to tumor progression rather than acting as a tumor suppressor as the authors propose. It would be helpful for readers for the authors to discuss these studies and why there is a discrepancy between them.

      We thank the reviewer for the valuable comment. We have expanded the discussion of these findings in the relevant section of the manuscript (Lines 317-318). We speculated that the differences between our observations and previous reports may be attributable to the inherent heterogeneity of tumor tissues as well as variations in tumor stage.

      Reviewer #2 (Public Review):

      The manuscript by Chen et al. describes how low levels of CPT1A in colorectal cancer (CRC) confer radioresistance by expediting radiation-induced ROS clearance. The authors propose that this mechanism of ROS homeostasis is regulated through FOXM1. CPT1A is known for its role in fatty acid metabolism via beta-oxidation of long-chain fatty acids, making it important in many metabolic disorders and cancers.

      Previous studies have suggested that the upregulation of CPT1A is essential for the tumor-promoting effect in colorectal cancers (CRC) (PMID: 32913185). For example, CPT1A-mediated fatty acid oxidation promotes colorectal cancer cell metastasis (PMID: 2999587), and repression of CPT1A activity renders cancer cells more susceptible to killing by cytotoxic T lymphocytes (PMID: 37722058). Additionally, inhibition of CPT1A-mediated fatty-acid oxidation (FAO) sensitizes nasopharyngeal carcinomas to radiation therapy (PMID: 29721083). While this suggests a tumor-promoting effect for CPT1A, the work by Chen et al. suggests instead a tumor-suppressive function for CPT1A in CRC, specifically that loss or low expression of CPT1A confers radioresistance in CRC. This makes the findings important given that they oppose the previously proposed tumorigenic function of CPT1A. However, the data presented in the manuscript is limited in scope and analysis.

      Major Limitations:

      (1) Analysis of Patient Samples

      - Figure 1D shows that CPT1A levels are significantly lower in COAD and READ compared to normal tissues. It would be beneficial to show whether CPT1A levels are also significantly lower in CRC compared to other tumor types using TCGA data.

      We thank the reviewer for the valuable comment. We assessed the expression levels of CPT1A across all cancer types in the TCGA dataset and found that the abundance of CPT1A in CRC was significantly lower compared to cholangiocarcinoma (CHOL), esophageal carcinoma (ESCA), kidney chromophobe (KICH), acute myeloid leukemia (LAML), and stomach adenocarcinoma (STAD) (Author response image 1).

      Author response image 1.

      The mRNA level of CPT1A across all cancer types in the TCGA dataset.

      - The analysis should include a comparison of closely related CPT1 isoforms (CPT1B and CPT1C) to emphasize the specific importance of CPT1A silencing in CRC.

      We thank the reviewer for the valuable comment. We further examined the mRNA expression levels of the CPT1 isoforms CPT1B and CPT1C in COAD and READ tumor samples and their respective normal tissue counterparts. The results showed that CPT1B was significantly upregulated in READ tumor samples compared to normal tissues. Similarly, CPT1C was significantly overexpressed in both READ and COAD tumor samples relative to their normal tissue controls (Author response image 2).

      Author response image 2.

      The mRNA expression levels of CPT1B and CPT1C in rectal adenocarcinoma (READ) and colon adenocarcinoma (COAD) based on data from the TCGA database. A. CPT1B expression in READ. B. CPT1B expression in COAD. C. CPT1C expression in READ. D. CPT1C expression in COAD.

      - Figure 2 lacks a clear description of how IHC scores were determined and the criteria used to categorize patients into CPT1A-high and CPT1A-low groups. This should be detailed in the text and figure legend.

      We thank the reviewer for the valuable comment. We have provided a detailed description of the methodology used to determine the IHC scores and criteria applied to categorise patients into CPT1A-high and CPT1A-low groups in the Materials and Methods section (Lines 418-426) as well as the legend of Figure 2A.

      - None of Figure 2B or 2C show how many patients were assigned to the CPT1A-low and CPT1A-high groups.

      We thank the reviewer for the valuable comment. We have added the number of patients in the CPT1A-low and CPT1A-high groups to the legends of Figures 2B and 2C.

      (2) Model Selection and Experimental Approaches

      - The authors primarily use CPT1A knockout (KO) HCT116 cells and CPT1A overexpression (OE) SW480 cells for their experiments, which poses major limitations.

      We thank the reviewer for the valuable comment.

      - The genetic backgrounds of the cell lines (e.g., HCT116 being microsatellite instable (MSI) and SW480 not) should be considered as they might influence treatment outcomes. This should be acknowledged as a major limitation.

      We thank the reviewer for the valuable comment. Indeed, the genetic background differences among cell lines represent a significant limitation. We have addressed this issue in the discussion section (Lines 363-365). 

      - Regardless of their CPT1A expression levels, for the experiments with HCT116 and SW480 cells in Figure 3C-F, it would be useful to see whether HCT116 cells can be further sensitized to radiotherapy in overexpression and whether SW480 cells can be desensitized through CPT1A KO.

      We thank the reviewer for the valuable comment. Due to the inherently high levels of CPT1A in the HCT116 cell line, we attempted to perform relevant experiments but were unable to achieve significant overexpression. Similarly, we faced challenges with the SW480 cell line, which has lower levels of CPT1A. We could thus not provide additional insights in this respect.

      - The use of only two CRC cell lines is insufficient to draw broad conclusions. Additional CRC cell lines should be used to validate the findings and account for genetic heterogeneity. The authors should repeat key experiments with additional CRC cell lines to strengthen their conclusions.

      We thank the reviewer for the valuable comment. To address this issue, we used a radiation-resistant variant of the HCT-15 cell line as a new approach to investigate whether CPT1A is associated with cellular radiation sensitivity. We believe that the data obtained from these acquired resistant cell lines are comparable to those from the ordinary cell lines mentioned in the reviewer’s comment.

      (3) Pharmacological Inhibition

      Several studies have reported beneficial outcomes of using CPT1 pharmacological inhibition to limit cancer progression (e.g., PMID: 33528867, PMID: 32198139), including its application in sensitization to radiation therapy (PMID: 30175155). Since the authors argue for the opposite case in CRC, they should show this through pharmacological means such as etomoxir and whether CPT1A inhibition phenocopies their observed genetic KO effect, which would have important implications for using this inhibitor in CRC patients.

      We thank the reviewer for the valuable comment. The referenced literature has indeed attracted our attention. Our research group is concurrently investigating the role of CPT1A in tumor radiotherapy and immunology, utilising CPT1A inhibitors for experimental validation. We look forward to publishing these related studies to further support the conclusions presented in our manuscript.

      (4) Data Representation and Statistical Analysis

      - The relative mRNA expression levels across the seven cell lines (Supplementary Figure 1C) differ greatly from those reported in the DepMap (https://depmap.org/portal/). This discrepancy should be addressed.

      We thank the reviewer for the valuable comment. The observed differences in mRNA levels may be attributable to variations in cell culture density. For subsequent radiation sensitivity experiments, we maintained our cell culture density at approximately 70–80% confluence.

      - The statistical significance of differences in mRNA and protein levels between RT-sensitive and RT-resistant cells should be shown (Supplementary Figure 1C, D).

      As suggested, we have included a statistical analysis of the differences in CPT1A mRNA levels between RT-sensitive and -resistant cells in Figure 3 and Supplementary Figure 1C. However, further analysis revealed no significant difference in CPT1A protein levels between these groups. This was attributed to the high variability in grayscale values observed between the groups.

      Conclusion

      The study offers significant insights into the role of CPT1A in CRC radioresistance, proposing a tumor-suppressive function. However, the scope and depth of the analysis need to be expanded to fully validate these claims. Additional CRC cell lines, pharmacological inhibition studies, and a more detailed analysis of patient samples are essential to strengthen the conclusions.

      We would like to thank Reviewer #2 for the comments.

      Reviewer #3 (Public Review):

      Summary:

      The study aims to elucidate the role of CPT1A in developing resistance to radiotherapy in colorectal cancer (CRC). The manuscript is a collection of assays and analyses to identify the mechanism by which CPT1A leads to treatment resistance through increased expression of ROS-scavenging genes facilitated by FOXM1 and provides an argument to counter this role, leading to a reversal of treatment resistance.

      Strengths:

      The article is well written with sound scientific methodology and results. The assays performed are well within the scope of the hypothesis of the study and provide ample evidence for the role of CPT1A in the development of treatment resistance in colorectal cancer. While providing compelling evidence for their argument, the authors have also rightfully provided limitations of their work.

      We would like to thank Reviewer #3 for the positive comments.

      Weaknesses:

      The primary weakness of the study is acknowledged by the authors at the end of the Discussion section of the manuscript. The work heavily relies on bioinformatics and in vitro work with little backing of in vivo and patient data. In terms of animal studies, it is to be noted that the model they have used is nude mice with non-orthotopic, subcutaneous xenograft, which may not be the best recreation of the patient tumor.

      We thank the reviewer for the insightful comment. Our research group is continuing to explore the role of CPT1A in colorectal cancer radiotherapy and immunotherapy. In a new study, we used a C57BL/6 mouse model to conduct in-vivo experiments. Preliminary data suggest that CPT1A confers heightened radiosensitivity to immunocompetent mice. We look forward to the forthcoming publication of this ongoing research project.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The manuscript was challenging to read and contained many typographical errors and tangents that were not logically relevant to the logic of the paper. For example, in lines 365-367 the authors talk about peroxisomes being important for redox balance and that they will target peroxisomal pathways. However, the authors do not perform any experiments targeting peroxisomal pathways. So, I found myself quite perplexed. Careful proofreading of the manuscript would improve the utility for readers.

      We thank the reviewer for the insightful comments. We have made several additions throughout the manuscript to include more relevant information and experimental details, thereby improving the manuscript’s logical structure and readability. As described in the text, we used the DCFH-DA probe to measure ROS levels in cells, considering that regulation of intracellular ROS levels is a major function of peroxidases. We examined the transcriptional levels, protein expression, and enzymatic activities of peroxidases such as CAT, SOD1, SOD2, and SOD3 through qPCR, Western blotting, and specific assay kits.

      Reviewer #2 (Recommendations For The Authors):

      (1) Clarification and Flow

      Introduction Clarity: The introduction introduces several topics in succession without clearly connecting them. For example, the introduction of FOXM1 on Line 102 lacks clarity in its relationship to the study. Consider discussing these elements only in the discussion section to avoid confusion.

      We thank the reviewer for this insightful comment. We have moved the section on FOXM1 to the discussion to enhance readability (Lines 342-348).

      Explanation for Non-experts: Both the multi-target single-hit survival model and the comet assay require one sentence to explain their principles for non-experts in the field.

      As suggested, we have included brief explanations of the multi-target single-hit survival model and the comet assay in the Materials and Methods section to clarify these concepts to readers not familiar with the subject (Lines 458-460 and 462-465). 

      (2) Specific Text Revisions

      - Line 302: "We transfected the CRISPR/Cas9 lentivirus into HCT 116 ... efficiency of the 2nd site was the highest" - Clarify what is meant by "second site." If you mean the second sgRNA, please use this term.

      As suggested, we have revised ‘2nd’ to ‘second’ (Lines 151 and 152).

      - Lines 358-359: For the results subsection "Low CPT1A levels accelerate post-radiation ROS scavenging," include an introductory sentence, such as: "To study the mechanism of low CPT1A expression in radiotherapy resistance, we conducted differential gene expression analysis between HCT116 CPT1A KO and NC cells."

      As suggested, we have added an introductory sentence in the section titled ‘Low CPT1A Levels Accelerate Post-Radiation ROS Scavenging’ (Lines 215-217).

      - Line 359: "The gene expression heatmap showed high consistency among replicates for both HCT 116-NC and HCT 116-KO cells (Supplementary Figure 3A)." If these are technical replicates performed on the same batch of KO or NC cells, please state this clearly.

      We have added the suggested information to improve clarity (Line 218).

      - Lines 360-362: "With CPT1A knockdown, we found 363 upregulated and 1290 downregulated genes (|log2(fold change)|>1 and P<0.05)." Ensure that the p-value is correct; it seems this should be q-value < 0.05.

      As suggested, we have revised ‘p’ to ‘q’ (Lines 220 and 496).

      - Line 363: Introduce the term "DEGs" as Differentially Expressed Genes in the main text, not just in the Materials and Methods (line 215).

      As suggested, we have introduced the term "DEGs" as Differentially Expressed Genes in the main text (Lines 221-222).

      - Lines 364-365: "Showing that the main enriched pathways were in peroxisomes, cell cycle nucleotide excision repair, and fatty acid degradation (Figure 5A)." The data does not support this statement. Clarify that the listed pathways are AMONG the enriched KEGG pathways.

      As suggested, we have revised the relevant part in the manuscript (Lines 222-224).

      - Line 370: "...following 6 Gy irradiation and 1 h of incubation with DCFH-DA (Figure 5C)." Write out the term DCFH-DA and explain it for non-experts: "a fluorescent redox probe used to detect reactive oxygen species."

      As suggested, we have added a brief explanation to clarify the term for readers not familiar with the subject (Lines 230-231). 

      - Line 444: "CPT1A is an essential tumor suppressor." This statement has not been validated or referenced adequately.

      As suggested, we have removed the sentence to improve clarity.

      - Line 447: Clarify the relevance of the He, Zhang & Xu reference.

      We apologise for the error and have removed the reference.

      (3) Figure Improvements

      - Standardize Graph Labels: Ensure that graph axis labels and numbering are consistent and legible across the manuscript. For example, Figure 1A has large labels, while Figure 1B has much smaller labels. Ensure all graphs, such as 2C and 3G, have readable labels and numbering.

      We thank the reviewer for the insightful comment. We have revised the labels and numbering in Figures 1B, 2C, and 3G.

      - Figure 2B and 2C: Correct the x-axis label from "mouths" to "months."

      We thank the reviewer for this insightful comment. We have revised the labels in Figure 2B and 2C.

      - Figure 3 Legend: Clarify what is meant by "different groups of cell lines" in the legend of Figure 3. Specify whether these are single clones, pooled clones, or mixtures of cells in the text and/or figure legend.

      We thank the reviewer for this insightful comment. We have updated the legend of Figure 3 to enhance clarity.

      - Figures 3H and 3I: Label the blots clearly to indicate which refer to HCT116 NC and KO and which to SW480 RFP and OE.

      We thank the reviewer for this insightful comment. We have revised the labels in Figure 3H and 3I.

      - Supplementary Figure 2A: Describe the terms F and W in the legend.

      We thank the reviewer for this insightful comment. 'F' denotes fraction and 'W' denotes week. We have updated the legend of Figure 3 and Figure 3-figure supplement 2 to improve clarity.

      - Supplementary Data: Consider moving the data described in Supplementary Figure 2 to the main figures as it is among the most convincing data in the paper.

      We thank the reviewer for this insightful comment. We have decided to retain this figure at its current position, as we believe the data presented provide complementary evidence supporting the conclusion discussed earlier.

    2. eLife Assessment

      This study reports a valuable finding for the treatment of colorectal cancer (CRC), as the authors demonstrated that the enzyme CPT1A plays an significant role in the response to radiotherapy in CRC patients. However, the reviewers found that the results presented are still incomplete.

    3. Reviewer #1 (Public review):

      Summary:

      Fats and lipids serve many important roles in cancers, including serving as important fuels for energy metabolism in cancer cells by being oxidized in the mitochondria. The process of fatty acid oxidation is initiated by the enzyme carnitine palmitoyltransferase 1A (CPT1A), and the function and targetability of CPT1A in cancer metabolism and biology has been heavily investigated. This includes studies that have found important roles for CPT1A in colorectal cancer growth and metastasis.

      In this study, Chen and colleagues use analysis of patient samples and functional interrogation in animal models to examine the role CPT1A plays in colorectal cancer (CRC). The authors find that CPT1A expression is decreased in CRC compared to paired healthy tissue and that lower expression correlates with decreased patient survival over time, suggesting that CPT1A may suppress tumor progression. To functionally interrogate this hypothesis, the authors both use CRISPR to knockout CPT1A in a CRC cell line that expresses CPT1A, and overexpress CPT1A in a CRC cell line with low expression. In both systems, increased CPT1A expression decreased cell survival and DNA repair in response to radiation in culture. Further, in xenograft models CPT1A decreased tumor growth basally and radiotherapy could further decrease tumor growth in CPT1A expressing tumors. As CRC is often treated with radiotherapy, the authors argue this radiosensitization driven by CPT1A could explain why CPT1A expression correlates with increased patient survival.

      Lastly, Chen and colleagues sought to understand why CPT1A suppresses CRC tumor growth and sensitizes the tumors to radiotherapy in culture. Antioxidant capacity of cells can increase cell survival, so the authors examine antioxidant gene expression and levels in CPT1A expressing and non-expressing cells. CPT1A expression suppresses expression of antioxidant metabolism genes and lowers levels of antioxidants. Antioxidant metabolism genes can be regulated by the FOXM1 transcription factor, and the authors find that CPT1A expression regulates FOXM1 levels and that antioxidant gene expression can be partially rescued in CPT1A expressing CRC cells. This leads the authors to propose the following model: CPT1A expression downregulates FOXM1 (via some yet undescribed mechanism) which then leads to decreased antioxidant capacity in CRC cells and thus suppressing tumor progression and increasing radiosensitivity. This is an interesting model that could explain suppression of CPT1A expression in CRC, but key tenets of the model are untested and speculative.

      Strengths:

      • Analysis of CPT1A in paired CRC tumors and non-tumor tissue using multiple modalities combined with analysis of independent datasets rigorously show that CPT1A is downregulated in CRC tumors at the RNA and protein level.<br /> • The authors use paired cell line model systems where CPT1A is both knocked out and overexpressed in cells lines that endogenously express or repress CPT1A respectively. These complementary model systems increase the rigor of the study.<br /> • The finding that a metabolic enzyme generally thought to support tumor energetics actually is a tumor suppressor in some settings is theoretically quite interesting.

      Weaknesses:

      • The authors propose that CPT1A expression modulates antioxidant capacity in cells by suppressing FOXM1 and that this pathway alters CRC growth and radiotherapy response. However, key aspects of this model are not tested. The authors do not show that FOXM1 contributes to regulation of antioxidant levels in CRC cells and tumors or if FOXM1 suppression is key to inhibition of CRC tumor growth and radiosensitization by CPT1A. Thus, the model the authors propose is speculative and not supported by the existing data.<br /> • The authors propose two mechanisms by which CPT1A expression triggers radiosensitization: decreasing DNA repair capacity (Fig. 3) and decreasing antioxidant capacity (Fig. 5). However, while CPT1A expression does alter these capacities in CRC cells, neither is functionally tested to determine if altered DNA repair or antioxidant capacity (or both) are the reason why CRC cells are more sensitive to radiotherapy or are delayed in causing tumors in vivo. Thus, this aspect of the proposed model is also speculative.<br /> • The authors find that CPT1A affects radiosensitization in cell culture and assess this in vivo. In vivo, CPT1A expression slows tumor growth even in the absence of radiotherapy, and radiotherapy only proportionally decreases tumor growth to the same extent as it does in CPT1A non-expressing CRC tumors. The authors propose from this data that CPT1A expression also sensitizes tumors to radiotherapy in vivo. However, it is unclear that CPT1A expression causes radiosensitization in vivo or if CPT1A expression acts as independent tumor suppressor to which radiotherapy has an additive effect. Additional experiments would be necessary to differentiate between these possibilities.<br /> • The authors propose in Figure 3 that DNA repair capacity is inhibited in CRC cells by CPT1A expression. However, the gH2AX immunoblots performed in Figure 3H-I that measure DNA repair kinetics are not convincing that CPT1A expression impairs DNA repair kinetics. Separate blots are shown for CPT1A expressing and non-expressing cell lines, not allowing for rigorous comparison of gH2AX levels and resolution as CPT1A expression is modulated.

    4. Reviewer #2 (Public review):

      The manuscript by Chen et al. describes how low levels of CPT1A in colorectal cancer (CRC) confer radioresistance by expediting radiation-induced ROS clearance. The authors propose that this mechanism of ROS homeostasis is regulated through FOXM1. CPT1A is known for its role in fatty acid metabolism via beta-oxidation of long-chain fatty acids, making it important in many metabolic disorders and cancers.

      Previous studies have suggested that upregulation of CPT1A is essential for the tumor-promoting effect in colorectal cancers (CRC) (PMID: 32913185). For example, CPT1A-mediated fatty acid oxidation promotes colorectal cancer cell metastasis (PMID: 29995871), and repression of CPT1A activity renders cancer cells more susceptible to killing by cytotoxic T lymphocytes (PMID: 37722058). Additionally, CPT1A-mediated fatty acid oxidation (FAO) sensitizes nasopharyngeal carcinomas to radiation therapy (PMID: 29721083). While this suggests a tumor-promoting effect for CPT1A, the work by Chen et al. suggests instead a tumor-suppressive function for CPT1A in CRC, specifically that loss or low expression of CPT1A confers radioresistance in CRC. This makes the findings important given that they oppose the previously proposed tumorigenic function of CPT1A.

      The study has several strengths. The authors employ both in vitro and in vivo models to demonstrate that low CPT1A levels lead to radioresistance in CRC cells. They use isogenic HCT15 CRC cell lines that are radioresistant and show that overexpression of CPT1A sensitizes these cells to radiotherapy. Interestingly, the radioresistant cells exhibit lower CPT1A levels, suggesting that downregulation of CPT1A may be involved in the acquisition of radioresistance. Throughout the manuscript, the authors acknowledge the limitations of their work and avoid overextending their conclusions.

      However, there are some major limitations to the study:

      (1) Unexplored Contradictions with Previous Studies<br /> While the authors propose a tumor-suppressive function for CPT1A in CRC, they do not sufficiently address the contradiction with prior studies that indicate a tumor-promoting role for CPT1A. The discussion briefly mentions that this discrepancy may stem from heterogeneity or differences in tumor stages, but a more thorough exploration is needed. Delving deeper into the contexts and conditions under which CPT1A exhibits differing roles would be critical for reconciling these findings and guiding future research.

      (2) Limited Patient Data Analysis<br /> The authors demonstrate that CPT1A levels are significantly lower in COAD (colon adenocarcinoma) and READ (rectal adenocarcinoma) compared to normal tissues. However, data from TCGA indicate that CPT1A expression levels are lower in 26 out of 31 tumor types compared to COAD or READ (as noted in the authors' response to the previous review). It is possible that reduced CPT1A expression might be a common feature across various cancers, not just CRC. A more comprehensive analysis comparing matched normal and tumor tissues across different cancer types would clarify whether the observed phenomenon is unique to CRC or part of a broader pattern. This is particularly important since several studies have reported CPT1A overexpression in tumors.

      (3) Limitations in Experimental Scope<br /> The experimental design primarily involves CPT1A knockout in HCT116 cells and CPT1A overexpression in SW480 cells, which may limit the generalizability of the findings. Utilizing additional cell lines would account for genetic heterogeneity and enhance the robustness of the conclusions. Moreover, while the authors suggest an opposing effect of CPT1A in CRC compared to other studies, they have not investigated this through pharmacological means. Previous studies have shown that pharmacological inhibition of CPT1A can limit cancer progression (e.g., PMID: 33528867, PMID: 32198139) and sensitize cells to radiation therapy (PMID: 30175155). Testing whether pharmacological inhibitors like etomoxir or ST1326 replicate the effects observed with genetic knockout would provide valuable insights and have significant implications for therapeutic strategies in CRC patients.

      Conclusion

      This study offers valuable insights into the role of CPT1A in CRC radioresistance, proposing a tumor-suppressive function that challenges previous findings of its tumor-promoting role. While the findings are interesting and could have significant implications for cancer therapy, the limitations in experimental scope and the lack of a thorough discussion reconciling contradictory evidence warrant caution. Expanding the research to include a wider range of CRC cell lines, conducting pharmacological inhibition studies, and performing more detailed analyses would strengthen the conclusions and enhance our understanding of CPT1A's complex role in cancer progression and treatment response.

    1. eLife Assessment

      This is a valuable study of the physiological mechanisms promoting network activity during fever in the mouse neocortex. Most of the supporting evidence is solid, however, there is incomplete support for the conclusion that the E/I balance is unchanged with temperature.

    2. Reviewer #1 (Public review):

      The paper by Chen et al describes the role of neuronal themo-TRPV3 channels in the firing of cortical neurons at a fever temperature range. The authors began by demonstrating that exposure to infrared light increasing ambient temperature causes body temperature to rise to a fever level above 38{degree sign}C. Subsequently, they showed that at the fever temperature of 39{degree sign}C, the spike threshold (ST) increased in both populations (P12-14 and P7-8) of cortical excitatory pyramidal neurons (PNs). However, the spike number only decreased in P7-8 PNs, while it remained stable in P12-14 PNs at 39 degrees centigrade. In addition, the fever temperature also reduced the late peak postsynaptic potential (PSP) in P12-14 PNs. The authors further characterized the firing properties of cortical P12-14 PNs, identifying two types: STAY PNs that retained spiking at 30{degree sign}C, 36{degree sign}C, and 39{degree sign}C, and STOP PNs that stopped spiking upon temperature change. They further extended their analysis and characterization to striatal medium spiny neurons (MSNs) and found that STAY MSNs and PNs shared the same ST temperature sensitivity. Using small molecule tools, they further identified that themo-TRPV3 currents in cortical PNs increased in response to temperature elevation, but not TRPV4 currents. The authors concluded that during fever, neuronal firing stability is largely maintained by sensory STAY PNs and MSNs that express functional TRPV3 channels. Overall, this study is well designed and executed with substantial controls, some interesting findings, and quality of data. Here are some specific comments:

      (1) Could the authors discuss, or is there any evidence of, changes in TRPV3 expression levels in the brain during the postnatal 1-4 week age range in mice?

      (2) Are there any differential differences in TRPV3 expression patterns that could explain the different firing properties in response to fever temperature between the STAY- and STOP neurons?

      (3) TRPV3 and TRPV4 can co-assemble to form heterotetrameric channels with distinct functional properties. Do STOP neurons exhibit any firing behaviors that could be attributed to the variable TRPV3/4 assembly ratio?

      (4) In Figure 7, have the authors observed an increase of TRPV3 currents in MSNs in response to temperature elevation?

      (5) Is there any evidence of a relationship between TRPV3 expression levels in D2+ MSNs and degeneration of dopamine-producing neurons?

      (6) Does fever range temperature alter the expressions of other neuronal Kv channels known to regulate the firing threshold?

    3. Reviewer #2 (Public review):

      Summary:

      The authors study the excitability of layer 2/3 pyramidal neurons in response to layer four stimulation at temperatures ranging from 30 to 39 Celsius in P7-8, P12-P14, and P22-P24 animals. They also measure brain temperature and spiking in vivo in response to externally applied heat. Some pyramidal neurons continue to fire action potentials in response to stimulation at 39 C and are called stay neurons. Stay neurons have unique properties aided by TRPV3 channel expression.

      Strengths:

      The authors use various techniques and assemble large amounts of data.

      Weaknesses:

      (1) No hyperthermia-induced seizures were recorded in the study.

      (2) Febrile seizures in humans are age-specific, extending from 6 months to 6 years. While translating to rodents is challenging, according to published literature (see Baram), rodents aged P11-16 experience seizures upon exposure to hyperthermia. The rationale for publishing data on P7-8 and P22-24 animals, which are outside this age window, must be clearly explained to address a potential weakness in the study.

      (3) Authors evoked responses from layer 4 and recorded postsynaptic potentials, which then caused action potentials in layer 2/3 neurons in the current clamp. The post-synaptic potentials are exquisitely temperature-sensitive, as the authors demonstrate in Figures 3 B and 7D. Note markedly altered decay of synaptic potentials with rising temperature in these traces. The altered decays will likely change the activation and inactivation of voltage-gated ion channels, adjusting the action potential threshold.

      (4) The data weakly supports the claim that the E-I balance is unchanged at higher temperatures. Synaptic transmission is exquisitely temperature-sensitive due to the many proteins and enzymes involved. A comprehensive analysis of spontaneous synaptic current amplitude, decay, and frequency is crucial to fully understand the effects of temperature on synaptic transmission.

      (5) It is unclear how the temperature sensitivity of medium spiny neurons is relevant to febrile seizures. Furthermore, the most relevant neurons are hippocampal neurons since the best evidence from human and rodent studies is that febrile seizures involve the hippocampus.

      (6) TRP3V3 data would be convincing if the knockout animals did not have febrile seizures.

    4. Reviewer #3 (Public review):

      Summary:

      This important study combines in vitro and in vivo recording to determine how the firing of cortical and striatal neurons changes during a fever range temperature rise (37-40 oC). The authors found that certain neurons will start, stop, or maintain firing during these body temperature changes. The authors further suggested that the TRPV3 channel plays a role in maintaining cortical activity during fever.

      Strengths:

      The topic of how the firing pattern of neurons changes during fever is unique and interesting. The authors carefully used in vitro electrophysiology assays to study this interesting topic.

      Weaknesses:

      (1) In vivo recording is a strength of this study. However, data from in vivo recording is only shown in Figures 5A,B. This reviewer suggests the authors further expand on the analysis of the in vivo Neuropixels recording. For example, to show single spike waveforms and raster plots to provide more information on the recording. The authors can also separate the recording based on brain regions (cortex vs striatum) using the depth of the probe as a landmark to study the specific firing of cortical neurons and striatal neurons. It is also possible to use published parameters to separate the recording based on spike waveform to identify regular principal neurons vs fast-spiking interneurons. Since the authors studied E/I balance in brain slices, it would be very interesting to see whether the "E/I balance" based on the firing of excitatory neurons vs fast-spiking interneurons might be changed or not in the in vivo condition.

      (2) The author should propose a potential mechanism for how TRPV3 helps to maintain cortical activity during fever. Would calcium influx-mediated change of membrane potential be the possible reason? Making a summary figure to put all the findings into perspective and propose a possible mechanism would also be appreciated.

      (3) The author studied P7-8, P12-14, and P20-26 mice. How do these ages correspond to the human ages? it would be nice to provide a comparison to help the reader understand the context better.

    5. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      The paper by Chen et al describes the role of neuronal themo-TRPV3 channels in the firing of cortical neurons at a fever temperature range. The authors began by demonstrating that exposure to infrared light increasing ambient temperature causes body temperature to rise to a fever level above 38{degree sign}C. Subsequently, they showed that at the fever temperature of 39{degree sign}C, the spike threshold (ST) increased in both populations (P12-14 and P7-8) of cortical excitatory pyramidal neurons (PNs). However, the spike number only decreased in P7-8 PNs, while it remained stable in P12-14 PNs at 39 degrees centigrade. In addition, the fever temperature also reduced the late peak postsynaptic potential (PSP) in P12-14 PNs. The authors further characterized the firing properties of cortical P12-14 PNs, identifying two types: STAY PNs that retained spiking at 30{degree sign}C, 36{degree sign}C, and 39{degree sign}C, and STOP PNs that stopped spiking upon temperature change. They further extended their analysis and characterization to striatal medium spiny neurons (MSNs) and found that STAY MSNs and PNs shared the same ST temperature sensitivity. Using small molecule tools, they further identified that themo-TRPV3 currents in cortical PNs increased in response to temperature elevation, but not TRPV4 currents. The authors concluded that during fever, neuronal firing stability is largely maintained by sensory STAY PNs and MSNs that express functional TRPV3 channels. Overall, this study is well designed and executed with substantial controls, some interesting findings, and quality of data. Here are some specific comments: 

      (1) Could the authors discuss, or is there any evidence of, changes in TRPV3 expression levels in the brain during the postnatal 1-4 week age range in mice? 

      To our knowledge, no published studies have documented changes in TRPV3 expression levels in the brain during the 1st to 4th postnatal weeks in mice. Research on TRPV3 expression in the mouse brain has primarily involved RT-PCR analysis of RNA from dissociated tissue in adult mice (Jang et al., 2012; Kumar et al., 2018), largely due to the scarcity of effective antibodies for brain tissue sections at the time of publication. Furthermore, the Allen Brain Atlas lacks data on TRPV3 expression in the developing or postnatal brain. To address this gap, we plan to examine TRPV3 expression at P7-8, P12-13, and P20-23 as part of our manuscript revision.

      (2) Are there any differential differences in TRPV3 expression patterns that could explain the different firing properties in response to fever temperature between the STAY- and STOP neurons? 

      This is an excellent question and one we plan to explore in the future by developing reporter mice or viral tools to monitor the activity of cells with endogenous TRPV3 expression. To our knowledge, these tools do not currently exist. Creating them will be challenging, as it requires identifying promoters that accurately reflect endogenous TRPV3 expression. We have not yet quantified TRPV3 expression in STOP and STAY neurons; however, our analysis of evoked spiking activity at 30, 36, and 39°C suggests that TRPV3 expression may mark a population of pyramidal neurons that tend to STAY spiking as temperatures increase. To investigate this further, we are considering patch-seq for TRPV3 expression on recorded neurons. This is a complex experiment, as it requires recording activity at three different temperatures and subsequently collecting the cell contents. While success is not guaranteed, we are committed to attempting these experiments as part of our revisions.

      (3) TRPV3 and TRPV4 can co-assemble to form heterotetrameric channels with distinct functional properties. Do STOP neurons exhibit any firing behaviors that could be attributed to the variable TRPV3/4 assembly ratio? 

      There is some evidence that TRPV3 and TRPV4 proteins can physically associate in HEK293 cells and native skin tissues (Hu et al., 2022).  TRPV3 and TRPV4 are both expressed in the cortex (Kumar et al., 2018), but it remains unclear whether they are co-expressed and co-assembled to form heteromeric channels in cortical excitatory  pyramidal neurons.  Examination of the I-V curve from HEK cells co-expressing TRPV3/4 heteromeric channels shows enhanced current at negative membrane potentials (Hu et al., 2022).  

      Currently, we cannot characterize cells as STOP or STAY and measure TRPV3 or TRPV4 currents simultaneously, as this would require different experimental setups and internal solutions. Additionally, the protocol involves a sequence of recordings at 30, 36, and 39°C, followed by cooling back to 30°C and re-heating to each temperature. Cells undergoing such a protocol will likely not survive till the end.

      In our recordings of TRPV3 currents—which likely include both STOP and STAY cells—we do not observe a significant current at negative voltages, suggesting that TRPV3/4 heteromeric channels may either be absent or underrepresented, at least at a 1:1 ratio. However, the possibility that TRPV3/4 heteromeric channels could define the STOP cell population is intriguing and plausible.

      (4) In Figure 7, have the authors observed an increase of TRPV3 currents in MSNs in response to temperature elevation? 

      We have not recorded TRPV3 currents in MSNs in response to elevated temperatures.

      (5) Is there any evidence of a relationship between TRPV3 expression levels in D2+ MSNs and degeneration of dopamine-producing neurons? 

      This is an interesting question, though it falls outside our current research focus in the lab. A PubMed search yields no results connecting the terms TRPV3, MSNs, and degeneration. However, gain-of-function mutations in TRPV4 channel activity have been implicated in motor neuron degeneration (Sullivan et al., 2024) and axon degeneration (Woolums et al., 2020). Similarly, TRPV1 activation has been linked to developmental axon degeneration (Johnstone et al., 2019), while TRPV3 blockade has shown neuroprotective effects in models of cerebral ischemia/reperfusion injury in mice (Chen et al., 2022).

      The link between TRPV activation and cell degeneration, however, may not be straightforward. For instance, TRPV1 loss has been shown to accelerate stress-induced degradation of axonal transport from retinal ganglion cells to the superior colliculus and to cause degeneration of axons in the optic nerve (Ward et al., 2014). Meanwhile, TRPV1 activation by capsaicin preserves the survival and function of nigrostriatal dopamine neurons in the MPTP mouse model of Parkinson's disease (Chung et al., 2017).

      (6) Does fever range temperature alter the expressions of other neuronal Kv channels known to regulate the firing threshold? 

      This is an active line of investigation in our lab. The results of ongoing experiments will provide further insight into this question.

      Reviewer #2 (Public review): 

      Summary: 

      The authors study the excitability of layer 2/3 pyramidal neurons in response to layer four stimulation at temperatures ranging from 30 to 39 Celsius in P7-8, P12-P14, and P22-P24 animals. They also measure brain temperature and spiking in vivo in response to externally applied heat. Some pyramidal neurons continue to fire action potentials in response to stimulation at 39 C and are called stay neurons. Stay neurons have unique properties aided by TRPV3 channel expression. 

      Strengths: 

      The authors use various techniques and assemble large amounts of data. 

      Weaknesses: 

      (1) No hyperthermia-induced seizures were recorded in the study. 

      The goal of this manuscript is to uncover the age-related physiological changes that enable the brain to retain function at fever temperatures. These changes may potentially explain why most children do not experience febrile seizures or why, in the rare cases when they do occur, the most prominent window of susceptibility is between 2-5 years of age (Shinnar and O’Dell, 2004), as this may coincide with the window during which these developmental changes are normally occurring. While it is possible that impairments in these mechanisms could result in febrile seizures, another possibility is that neural activity may fall below the level required to maintain normal function.

      (2) Febrile seizures in humans are age-specific, extending from 6 months to 6 years. While translating to rodents is challenging, according to published literature (see Baram), rodents aged P11-16 experience seizures upon exposure to hyperthermia. The rationale for publishing data on P7-8 and P22-24 animals, which are outside this age window, must be clearly explained to address a potential weakness in the study. 

      This manuscript focuses on identifying the age-related physiological changes that enable the brain to retain function at fever temperatures. To this end, we examine two age periods flanking the putative window of susceptibility (P12-14), specifically an earlier timepoint (P7-8) and a later timepoint (P20-23). The inclusion of these time points also serves as a negative control, allowing us to determine whether the changes we observe in the proposed window of susceptibility are unique to this period. We believe that including these windows ensures a thorough and objective scientific approach.

      (3) Authors evoked responses from layer 4 and recorded postsynaptic potentials, which then caused action potentials in layer 2/3 neurons in the current clamp. The post-synaptic potentials are exquisitely temperature-sensitive, as the authors demonstrate in Figures 3 B and 7D. Note markedly altered decay of synaptic potentials with rising temperature in these traces. The altered decays will likely change the activation and inactivation of voltage-gated ion channels, adjusting the action potential threshold. 

      In Figure 4B, we surmised that the temperature-induced reductions in inhibition and the subsequent loss of the late PSP primarily contribute to the altered decay of the synaptic potentials.

      (4) The data weakly supports the claim that the E-I balance is unchanged at higher temperatures. Synaptic transmission is exquisitely temperature-sensitive due to the many proteins and enzymes involved. A comprehensive analysis of spontaneous synaptic current amplitude, decay, and frequency is crucial to fully understand the effects of temperature on synaptic transmission. 

      Thank you for the opportunity to provide clarification. It was not stated, nor did we intend to imply, that in general, E-I balance is unchanged at higher temperatures. Please see the excerpt from the manuscript below. The statements specifically referred to observations made for experiments conducted during the P20-26 age range for cortical pyramidal neurons. We have a parallel line of investigation exploring the differential susceptibility of E-I balance based on age and temperature. Additionally, our measurements focus on evoked activity, rather than spontaneous activity, as these events are more likely linked to the physiological changes underlying behavior in the sensory cortex.

      “As both excitatory and inhibitory PNs that stay spiking increase their firing rates (Figure 5B) and considering that some neurons within the network are inactive throughout or stop spiking, it is plausible that these events are calibrated such that despite temperature increases, the excitatory to inhibitory (E-I) balance within the circuit may remain relatively unchanged. Indeed, recordings of L4-evoked excitatory and inhibitory postsynaptic currents (respectively EPSCs and IPSCs) in wildtype L2/3 excitatory PNs in S1 cortex, where inhibition is largely mediated by the parvalbumin positive (PV) interneurons, showed that E-I balance (defined as E/E+I, the ratio of the excitatory current to the total current) remained unchanged as temperature increased from 36 to 39°C (Figure 5E).”

      (5) It is unclear how the temperature sensitivity of medium spiny neurons is relevant to febrile seizures. Furthermore, the most relevant neurons are hippocampal neurons since the best evidence from human and rodent studies is that febrile seizures involve the hippocampus. 

      Thank you for the opportunity to clarify. Our goal was not to establish a link between medium spiny neuron (MSN) function and febrile seizures. The manuscript's focus is on identifying age-related physiological changes that enable supragranular cortical cells in the brain to retain function at fever temperatures. MSNs were selected for mechanistic comparison in this study because they represent a non-pyramidal, non-excitatory neuronal subtype, allowing us to assess whether the physiological changes observed in L2/3 excitatory pyramidal neurons are unique to these cells.

      (6) TRP3V3 data would be convincing if the knockout animals did not have febrile seizures. 

      Could you kindly provide the reference indicating that TRPV3 KO mice have seizures? Unfortunately, we were unable to locate this reference. It is important to distinguish febrile seizures, which occur within the range of physiological body temperatures (~ 38 to 40°C), from seizures resulting from heat stroke, a severe form of hyperthermia occuring when body temperature exceeds 40.0 °C. Mechanistically, these may represent different phenomena, as the latter is typically associated with widespread protein denaturation and cell death, whereas febrile seizures are usually non-lethal.  Additionally, TRPV3 is located on chromosome 17p13.2, a region not currently associated with seizure susceptibility.

      Reviewer #3 (Public review): 

      Summary: 

      This important study combines in vitro and in vivo recording to determine how the firing of cortical and striatal neurons changes during a fever range temperature rise (37-40 oC). The authors found that certain neurons will start, stop, or maintain firing during these body temperature changes. The authors further suggested that the TRPV3 channel plays a role in maintaining cortical activity during fever. 

      Strengths: 

      The topic of how the firing pattern of neurons changes during fever is unique and interesting. The authors carefully used in vitro electrophysiology assays to study this interesting topic. 

      Weaknesses: 

      (1) In vivo recording is a strength of this study. However, data from in vivo recording is only shown in Figures 5A,B. This reviewer suggests the authors further expand on the analysis of the in vivo Neuropixels recording. For example, to show single spike waveforms and raster plots to provide more information on the recording. The authors can also separate the recording based on brain regions (cortex vs striatum) using the depth of the probe as a landmark to study the specific firing of cortical neurons and striatal neurons. It is also possible to use published parameters to separate the recording based on spike waveform to identify regular principal neurons vs fast-spiking interneurons. Since the authors studied E/I balance in brain slices, it would be very interesting to see whether the "E/I balance" based on the firing of excitatory neurons vs fast-spiking interneurons might be changed or not in the in vivo condition. 

      As requested, in the revised manuscript, we will include examples of single spike waveforms and raster plots for the in vivo recordings. Please note that all recordings were conducted in the cortex, not the striatum. To clarify, we used published parameters to separate the recordings based on spike waveform, which allowed us to identify regular principal neurons and fast-spiking interneurons. The paragraph below from the methods section describes this procedure.

      “ Following manual curation, based on their spike waveform duration, the selected single units (n= 633) were separated into putative inhibitory interneurons and excitatory principal cells (Barthóet al., 2004). The spike duration was calculated as the time difference between the trough and the subsequent waveform peak of the mean filtered (300 – 6000 Hz bandpassed) spike waveform. Durations of extracellularly recorded spikes showed a bimodal distribution (Hartigan’s dip test; p < 0.001) characteristic of the neocortex with shorter durations corresponding to putative interneurons (narrow spikes) and longer durations to putative principal cells (wide spikes). Next, k-means clustering was used to separate the single units into these two groups, which resulted in 140 interneurons (spike duration < 0.6 ms) and 493 principal cells (spike duration > 0.6 ms), corresponding to a typical 22% - 78% (interneuron – principal) cell ratio”.

      In vivo patching to record extracellular and inhibitory responses at 36°C and then waiting 10 minutes to record again at 39°C would be an extremely challenging experiment. Due to the high difficulty and expected very low yield, these experiments will not be pursued for the revision studies.

      (2) The author should propose a potential mechanism for how TRPV3 helps to maintain cortical activity during fever. Would calcium influx-mediated change of membrane potential be the possible reason? Making a summary figure to put all the findings into perspective and propose a possible mechanism would also be appreciated. 

      Thank you for your helpful suggestions. In response to your recommendation, we will include a summary figure detailing the hypothesis currently described in the discussion section of the manuscript. The excerpt from the discussion is included below.

      “Although, TRPV3 channels are cation-nonselective, they exhibit high permeability to Ca2+ (Ca²⁺ > Na⁺ ≈ K⁺ ≈ Cs⁺) with permeability ratios (relative to Na+) of 12.1, 0.9, 0.9, 0.9 (Xu et al., 2002). Opening of TRPV3 channels activates a nonselective cationic conductance and elevates membrane depolarization, which can increase the likelihood of generating action potentials. Indeed, our observations of a loss of the temperature-induced increases in the PSP with TRPV3 blockade are consistent with a reduction in membrane depolarization. In S1 cortical circuits at P12-14, STAY PNs appear to rely on a temperature-dependent activity mechanism, where depolarization levels (mediated by higher excitatory input and lower inhibitory input) are scaled to match the cell’s ST. Thus, an inability to increase PSPs with temperature elevations prevents PNs from reaching ST, so they cease spiking.”

      (3) The author studied P7-8, P12-14, and P20-26 mice. How do these ages correspond to the human ages? it would be nice to provide a comparison to help the reader understand the context better.

      Ideally, the mouse-human age comparison would depend on the specific process being studied. Please note that these periods are described in the introduction of the manuscript. The relevant excerpt is included below. Let us know if you need any additional modifications to this description.

      “Using wildtype mice across three postnatal developmental periods—postnatal day (P)7-8 (neonatal/early), P12-14 (infancy/mid), and P20-26 (juvenile/late)—we investigated the electrophysiological properties, ex vivo and in vivo, that enable excitatory pyramidal neurons (PNs) neurons in mouse primary somatosensory (S1) cortex to remain active during temperature increases from 30°C (standard in electrophysiology studies) to 36°C (physiological temperature), and then to 39°C (fever-range).”

    1. eLife Assessment

      This important study describes a computational tool termed FliSimBA (Fluorescence Lifetime Simulation for Biological Applications), which uses simulations to rigorously assess experimental limitations in fluorescence lifetime imaging microscopy (FLIM), including diverse noise factors, hardware effects, and sensor expression levels. The evidence from simulation and experimental measurements supporting the usefulness of FlimSimBA is solid. The authors may improve the application of the tool to a wide range of biological samples by providing the simulation package, currently in MATLB, in other common languages such as Python, and having better descriptions of the fitting algorithm and model assumptions. The work will interest scientists who wish to perform quantitative FLIM imaging for cells and tissues.

    2. Reviewer #1 (Public review):

      In this study, Ma et al. aimed to determine previously uncharacterized contributions of tissue autofluorescence, detector afterpulse, and background noise on fluorescence lifetime measurement interpretations. They introduce a computational framework they named "Fluorescence Lifetime Simulation for Biological Applications (FLiSimBA)" to model experimental limitations in Fluorescence Lifetime Imaging Microscopy (FLIM) and determine parameters for achieving multiplexed imaging of dynamic biosensors using lifetime and intensity. By quantitatively defining sensor photon effects on signal-to-noise in either fitting or averaging methods of determining lifetime, the authors contradict any claims of FLIM sensor expression insensitivity to fluorescence lifetime and highlight how these artifacts occur differently depending on the analysis method. Finally, the authors quantify how statistically meaningful experiments using multiplexed imaging could be achieved.

      A major strength of the study is the effort to present results in a clear and understandable way given that most researchers do not think about these factors on a day-to-day basis. The model code is available and written in Matlab, which should make it readily accessible, although a version in other common languages such as Python might help with dissemination in the community. One potential weakness is that the model uses parameters that are determined in a specific way by the authors, and it is not clear how vastly other biological tissue and microscope setups may differ from the values used by the authors.

      Overall, the authors achieved their aims of demonstrating how common factors (autofluorescence, background, and sensor expression) will affect lifetime measurements and they present a clear strategy for understanding how sensor expression may confound results if not properly considered. This work should bring to awareness an issue that new users of lifetime biosensors may not be aware of and that experts, while aware, have not quantitatively determined the conditions where these issues arise. This work will also point to future directions for improving experiments using fluorescence lifetime biosensors and the development of new sensors with more favorable properties.

    3. Reviewer #2 (Public review):

      Summary:

      By using simulations of common signal artefacts introduced by acquisition hardware and the sample itself, the authors are able to demonstrate methods to estimate their influence on the estimated lifetime, and lifetime proportions, when using signal fitting for fluorescence lifetime imaging.

      Strengths:

      They consider a range of effects such as after-pulsing and background signal, and present a range of situations that are relevant to many experimental situations.

      Weaknesses:

      A weakness is that they do not present enough detail on the fitting method that they used to estimate lifetimes and proportions. The method used will influence the results significantly. They seem to only use the "empirical lifetime" which is not a state of the art algorithm. The method used to deconvolve two multiplexed exponential signals is not given.

    4. Reviewer #3 (Public review):

      Summary:

      This study presents a useful computational tool, termed FLiSimBA. The MATLAB-based FLiSimBA simulations allow users to examine the effects of various noise factors (such as autofluorescence, afterpulse of the photomultiplier tube detector, and other background signals) and varying sensor expression levels. Under the conditions explored, the simulations unveiled how these factors affect the observed lifetime measurements, thereby providing useful guidelines for experimental designs. Further simulations with two distinct fluorophores uncovered conditions in which two different lifetime signals could be distinguished, indicating multiplexed dynamic imaging may be possible.

      Strengths:

      The simulations and their analyses were done systematically and rigorously. FliSimba can be useful for guiding and validating fluorescence lifetime imaging studies. The simulations could define useful parameters such as the minimum number of photons required to detect a specific lifetime, how sensor protein expression level may affect the lifetime data, the conditions under which the lifetime would be insensitive to the sensor expression levels, and whether certain multiplexing could be feasible.

      Weaknesses:

      The analyses have relied on a key premise that the fluorescence lifetime in the system can be described as two-component discrete exponential decay. This means that the experimenter should ensure that this is the right model for their fluorophores a priori and should keep in mind that the fluorescence lifetime of the fluorophores may not be perfectly described by a two-component discrete exponential (for which alternative algorithms have been implemented: e.g., Steinbach, P. J. Anal. Biochem. 427, 102-105, (2012)). In this regard, I also couldn't find how good the fits were for each simulation and experimental data to the given fitting equation (Equation 2, for example, for Figure 2C data).

      Also, in Figure 2C, the 'sensor only' simulation without accounting for autofluorescence (as seen in Sensor + autoF) or afterpulse and background fluorescence (as seen in Final simulated data) seems to recapitulate the experimental data reasonably well. So, at least in this particular case where experimental data is limited by its broad spread with limited data points, being able to incorporate the additional noise factors into the simulation tool didn't seem to matter too much.

    5. Author response:

      eLife Assessment

      This important study describes a computational tool termed FliSimBA (Fluorescence Lifetime Simulation for Biological Applications), which uses simulations to rigorously assess experimental limitations in fluorescence lifetime imaging microscopy (FLIM), including diverse noise factors, hardware effects, and sensor expression levels. The evidence from simulation and experimental measurements supporting the usefulness of FlimSimBA is solid. The authors may improve the application of the tool to a wide range of biological samples by providing the simulation package, currently in MATLB, in other common languages such as Python, and having better descriptions of the fitting algorithm and model assumptions. The work will interest scientists who wish to perform quantitative FLIM imaging for cells and tissues.

      We thank the editors and reviewers for the constructive feedback. We plan to provide the FLiSimBA simulation package in Python in addition to Matlab. We will also describe in more detail in the Results section our fitting method. Furthermore, we will explain more clearly in the text that our simulation package makes almost no model assumptions, and features flexibility and adaptability so that it can be used for any fluorescence lifetime measurements. We will clearly outline what are the specific examples we use for our case studies, and how users can input their own values based on the specific sensors, autofluorescence, and hardware they use.

      Public Reviews:

      Reviewer #1 (Public review):

      In this study, Ma et al. aimed to determine previously uncharacterized contributions of tissue autofluorescence, detector afterpulse, and background noise on fluorescence lifetime measurement interpretations. They introduce a computational framework they named "Fluorescence Lifetime Simulation for Biological Applications (FLiSimBA)" to model experimental limitations in Fluorescence Lifetime Imaging Microscopy (FLIM) and determine parameters for achieving multiplexed imaging of dynamic biosensors using lifetime and intensity. By quantitatively defining sensor photon effects on signal-to-noise in either fitting or averaging methods of determining lifetime, the authors contradict any claims of FLIM sensor expression insensitivity to fluorescence lifetime and highlight how these artifacts occur differently depending on the analysis method. Finally, the authors quantify how statistically meaningful experiments using multiplexed imaging could be achieved.

      A major strength of the study is the effort to present results in a clear and understandable way given that most researchers do not think about these factors on a day-to-day basis. The model code is available and written in Matlab, which should make it readily accessible, although a version in other common languages such as Python might help with dissemination in the community. One potential weakness is that the model uses parameters that are determined in a specific way by the authors, and it is not clear how vastly other biological tissue and microscope setups may differ from the values used by the authors.

      Overall, the authors achieved their aims of demonstrating how common factors (autofluorescence, background, and sensor expression) will affect lifetime measurements and they present a clear strategy for understanding how sensor expression may confound results if not properly considered. This work should bring to awareness an issue that new users of lifetime biosensors may not be aware of and that experts, while aware, have not quantitatively determined the conditions where these issues arise. This work will also point to future directions for improving experiments using fluorescence lifetime biosensors and the development of new sensors with more favorable properties.

      We appreciate the comments and helpful suggestions. We plan to present FLiSimBA simulation code in Python in addition to Matlab to make it more accessible to the community.

      One of the advantages of FLiSimBA is that the simulation package is flexible and adaptable, allowing users to input parameters based on the specific sensors, hardware, and autofluorescence measurements for their biological and optical systems. We used parameters based on one FRET-based sensor, measured autofluorescence from mouse tissue, and measured dark count/after pulse of our specific GaAsP PMT in this manuscript as examples. We will emphasize this advantage and further clarify how these parameters can be adapted to diverse tissues, imaging systems, and sensors based on individual users in our revision.

      Reviewer #2 (Public review):

      Summary:

      By using simulations of common signal artefacts introduced by acquisition hardware and the sample itself, the authors are able to demonstrate methods to estimate their influence on the estimated lifetime, and lifetime proportions, when using signal fitting for fluorescence lifetime imaging.

      Strengths:

      They consider a range of effects such as after-pulsing and background signal, and present a range of situations that are relevant to many experimental situations.

      Weaknesses:

      A weakness is that they do not present enough detail on the fitting method that they used to estimate lifetimes and proportions. The method used will influence the results significantly. They seem to only use the "empirical lifetime" which is not a state of the art algorithm. The method used to deconvolve two multiplexed exponential signals is not given.

      We appreciate the comments and constructive feedback and will more clearly describe the fitting methods in our revision.

      Two metrics are currently used to estimate lifetime in our paper, which are currently described in the Methods section ‘Experimental data collection, parameter determination, and simulation’ and ‘FLIM analysis’: (1) fitted P1: we described how lifetime histograms were fitted to Equation 2 with the Gauss-Newton nonlinear least-square fitting algorithm and the fitted P1 was used as lifetime estimation; (2) empirical lifetime, defined by Equation 5. These two metrics were used for the following reasons: (1) when the exponential decay equation of a sensor is known (for example, the FRET-based PKA activity sensor FLIM-AKAR can be described as a double exponential equation), fitted coefficients for each exponential component provide a robust way for lifetime estimate that is less sensitive to noise and background signals; (2) when the biophysical properties of sensors are unknown, or when the sensors cannot be easily described with single or double exponential equations, empirical lifetime (i.e. average lifetime values) provides an unbiased way to quantify fluorescence lifetime without assumptions of underlying models to describe sensor lifetime.

      To deconvolve two multiplexed exponential signals (Fig. 8), histograms were fitted to Equation 2 with the Gauss-Newton nonlinear least-square fitting algorithm, as described in Methods section ‘Simulation and analysis of multiplexed imaging with fluorescence intensity and lifetime data’.

      Considering the importance of these methodological details for evaluating the conclusions of this study, and the importance of appreciating the advantages and limitations of different methods of lifetime estimates (e.g. Figure 7), we will move the description of the fitting method to estimate P1 and the method of calculating empirical lifetime from Methods to Results, and will further clarify the rationale of using these different methods of lifetime estimates.

      Reviewer #3 (Public review):

      Summary:

      This study presents a useful computational tool, termed FLiSimBA. The MATLAB-based FLiSimBA simulations allow users to examine the effects of various noise factors (such as autofluorescence, afterpulse of the photomultiplier tube detector, and other background signals) and varying sensor expression levels. Under the conditions explored, the simulations unveiled how these factors affect the observed lifetime measurements, thereby providing useful guidelines for experimental designs. Further simulations with two distinct fluorophores uncovered conditions in which two different lifetime signals could be distinguished, indicating multiplexed dynamic imaging may be possible.

      Strengths:

      The simulations and their analyses were done systematically and rigorously. FliSimba can be useful for guiding and validating fluorescence lifetime imaging studies. The simulations could define useful parameters such as the minimum number of photons required to detect a specific lifetime, how sensor protein expression level may affect the lifetime data, the conditions under which the lifetime would be insensitive to the sensor expression levels, and whether certain multiplexing could be feasible.

      Weaknesses:

      The analyses have relied on a key premise that the fluorescence lifetime in the system can be described as two-component discrete exponential decay. This means that the experimenter should ensure that this is the right model for their fluorophores a priori and should keep in mind that the fluorescence lifetime of the fluorophores may not be perfectly described by a two-component discrete exponential (for which alternative algorithms have been implemented: e.g., Steinbach, P. J. Anal. Biochem. 427, 102-105, (2012)). In this regard, I also couldn't find how good the fits were for each simulation and experimental data to the given fitting equation (Equation 2, for example, for Figure 2C data).

      We thank the reviewer for the constructive feedback. We agree that the FLiSimBA users should ensure that the right decay equations are used to describe the fluorescent sensors. In this study, we used a FRET-based PKA sensor FLIM-AKAR to provide a proof-of-principle demonstration of FLiSimBA usage. The donor fluorophore of FLIM-AKAR, truncated monomeric enhanced GFP, follows a single exponential decay. FLIM-AKAR, a FRET-based sensor, follows a double exponential decay. The time constants of the two exponential components were determined previously (Chen, et al, Frontiers in pharmacology (2014)).  Thus, a double exponential decay equation with known τ1 and τ2 (Equation 1) was used for both simulation and fitting. In our revision, we will refer to our prior study characterizing the double exponential decay model of FLIM-AKAR. We will also emphasize the importance of using the right decay equations, strategies to estimate sensor decays, and how the flexibility of FLiSimBA allows users to input different forms of models to describe their specific sensor histograms. We will additionally provide data showing the goodness of fit for both simulated data and experimental data.

      Also, in Figure 2C, the 'sensor only' simulation without accounting for autofluorescence (as seen in Sensor + autoF) or afterpulse and background fluorescence (as seen in Final simulated data) seems to recapitulate the experimental data reasonably well. So, at least in this particular case where experimental data is limited by its broad spread with limited data points, being able to incorporate the additional noise factors into the simulation tool didn't seem to matter too much.

      We agree that in Figure 2C the contributions from autofluorescence, afterpulse, and background signals are small, because sensor photon count is high here. As seen in Figure 2B, when sensor photon counts are higher, the contributions from these other factors become less pronounced. The simulated data in Figure 2C were based on high photon counts because the simulated P1 value was determined by fitting experimental data. To achieve reasonable fitting with minimal interference from autofluorescence, afterpulse, and background signals, we used experimental data with high sensor expression. We will clarify these details in our revision.

    1. eLife Assessment

      In this manuscript, Wang et al describe the identification, and provide initial mechanistic characterization, of a potent probiotic strain with activity against Salmonella enterica infection. The evidence provided is compelling, with multiple and varied methodologies used to support the claims made by the authors. The findings reported are valuable to the probiotic and enteric infection subfields.

    2. Reviewer #1 (Public review):

      Summary:

      Diarrheal diseases represent an important public health issue. Among the many pathogens that contribute to this problem, Salmonella enterica serovar Typhimurium is an important one. Due to the rise in antimicrobial resistance and the problems associated with widespread antibiotic use, the discovery and development of new strategies to combat bacterial infections is urgently needed. The microbiome field is constantly providing us with various health-related properties elicited by the commensals that inhabit their mammalian hosts. Harnessing the potential of these commensals for knowledge about host-microbe interactions as well as useful properties with therapeutic implications will likely remain a fruitful field for decades to come. In this manuscript, Wang et al use various methods, encompassing classic microbiology, genomics, chemical biology, and immunology, to identify a potent probiotic strain that protects nematode and murine hosts from S. enterica infection. Additionally, authors identify gut metabolites that are correlated with protection, and show that a single metabolite can recapitulate the effects of probiotic administration.

      Strengths:

      The utilization of varied methods by the authors, together with the impressive amount of data generated, to support the claims and conclusions made in the manuscript is a major strength of the work. Also, the ability to move beyond simple identification of the active probiotic, also identifying compounds that are at least partially responsible for the protective effects, is commendable.

      Weaknesses:

      Although there is a sizeable amount of data reported in the manuscript, there seems to be a chronic issue of lack of details of how some experiments were performed. This is particularly true in the figure legends, which for the most part lack enough details to allow comprehension without constant return to the text. Additionally, 2 figures are missing. Figure 6 is a repetition of Figure 5, and Figure S4 is an identical replicate of Figure S3.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the investigators isolated one Lacticaseibacillus rhamnosus strain (P118), and determined this strain worked well against Salmonella Typhimurium infection. Then, further studies were performed to identify the mechanism of bacterial resistance, and a list of confirmatory assays was carried out to test the hypothesis.

      Strengths:

      The authors provided details regarding all assays performed in this work, and this reviewer trusted that the conclusion in this manuscript is solid. I appreciate the efforts of the authors to perform different types of in vivo and in vitro studies to confirm the hypothesis.

      Weaknesses:

      I have two main questions about this work.

      (1) The authors provided the below information about the sources from which Lacticaseibacillus rhamnosus was isolated. More details are needed. What are the criteria to choose these samples? Where did these samples originate from? How many strains of bacteria were obtained from which types of samples?

      Lines 486-488: Lactic acid bacteria (LAB) and Enterococcus strains were isolated from the fermented yoghurts collected from families in multiple cities of China and the intestinal contents from healthy piglets without pathogen infection and diarrhoea by our lab.

      Lines 129-133: A total of 290 bacterial strains were isolated and identified from 32 samples of the fermented yoghurt and piglet rectal contents collected across diverse regions within China using MRS and BHI medium, which consist s of 63 Streptococcus strains, 158 Lactobacillus/ Lacticaseibacillus Limosilactobacillus strains, and 69 Enterococcus strains.

      (2) As a probiotic, Lacticaseibacillus rhamnosus has been widely studied. In fact, there are many commercially available products, and Lacticaseibacillus rhamnosus is the main bacteria in these products. There are also ATCC type strains such as 53103.

      I am sure the authors are also interested to know whether P118 is better as a probiotic candidate than other commercially available strains. Also, would the mechanism described for P118 apply to other Lacticaseibacillus rhamnosus strains?

      It would be ideal if the authors could include one or two Lacticaseibacillus rhamnosus which are currently commercially used, or from the ATCC. Then, the authors can compare the efficacy and antibacterial mechanisms of their P118 with other strains. This would open the windows for future work.

    1. eLife Assessment

      Huson & Regehr characterize the spiking responses of UBCs to various patterns of synaptic stimulation and dissect the contributions of relevant glutamate receptors to their transformations. This study presents valuable findings describing how trains of mossy fiber stimulation control cerebellar unipolar brush cell discharges. The evidence that UBCs transform signals in diverse ways depending on their complement of AMPA, mGluR1, and mGluR2 receptors is solid.

    2. Reviewer #1 (Public review):

      In this manuscript, the authors recorded cerebellar unipolar brush cells (UBCs) in acute brain slices. They confirmed that mossy fiber (MF) inputs generate a continuum of UBC responses. Using systematic and physiological trains of MF electrical stimulation, they demonstrated that MF inputs either increased or decreased UBC firing rates (UBC ON vs. OFF) or induced complex, long-lasting modulation of their discharges. The MF influence on UBC firing was directly associated with a specific combination of metabotropic glutamate receptors, mGluR2/3 (inhibitory) and mGluR1 (excitatory). Ultimately, the amount and ratio of these two receptors controlled the time course of the effect, yielding specific temporal transformations such as phase shifts.

      Overall, the topic is compelling, as it broadens our understanding of temporal processing in the cerebellar cortex. The experiments are well-executed and properly analyzed.

      Strengths:

      (1) A wide range of MF stimulation patterns was explored, including burst duration and frequency dependency, which could serve as a valuable foundation for explicit modeling of temporal transformations in the granule cell layer.

      (2) The pharmacological blockade of mGluR2/3, mGluR1, AMPA, and NMDA receptors helped identify the specific roles of these glutamate receptors.

      (3) The experiments convincingly demonstrate the key role of mGluR1 receptors in temporal information processing by UBCs.

      Weaknesses:

      (1) This study is largely descriptive and represents only a modest incremental advance from the previous work (Guo et al., Nat. Commun., 2021).

      (2) The MF activity used to mimic natural stimulation was previously collected in primates, while the recordings were conducted in mice.

      (3) Inhibition was blocked throughout the study, reducing its physiological relevance.

    3. Reviewer #2 (Public review):

      This study addresses the question of how UBCs transform synaptic input patterns into spiking output patterns and how different glutamate receptors contribute to their transformations. The first figure utilizes recorded patterns of mossy fiber firing during eye movements in the flocculus of rhesus monkeys obtained from another laboratory. In the first figure, these patterns are used to stimulate mossy fibers in the mouse cerebellum during extracellular recordings of UBCs in acute mouse brain slices. The remaining experiments stimulate mossy fiber inputs at different rates or burst durations, which is described as 'mossy-fiber like', although they are quite simpler than those recorded in vivo. As expected from previous work, AMPA mediates the fast responses, and mGluR1 and mGluR2/3 mediate the majority of longer-duration and delayed responses. The manuscript is well organized and the discussion contextualizes the results effectively.

      The authors use extracellular recordings because the washout of intracellular molecules necessary for metabotropic signaling may occur during whole-cell recordings. These cell-attached recordings do not allow one to confirm that electrical stimulation produces a postsynaptic current on every stimulus. Moreover, it is not clear that the synaptic input is monosynaptic, as UBCs synapse on one another. This leaves open the possibility that delays in firing could be due to disynaptic stimulation. Additionally, the result that AMPA-mediated responses were surprisingly small in many UBCs, despite apparent mRNA expression, suggests the possibility that spillover from other nearby synapses activated the higher affinity extrasynaptic mGluRs and that that main mossy fiber input to the UBC was not being stimulated. For these reasons, some whole-cell recordings (or perforated patch) would show that when stimulation is confirmed to be monosynaptic and reliable it can produce the same range of spiking responses seen extracellularly and that AMPA receptor-mediated currents are indeed small or absent in some UBCs.

      A discussion of whether the tested glutamate receptors affected the spontaneous firing rates of these cells would be informative as standing currents have been reported in UBCs. It is unclear whether the firing rate was normalized for each stimulation, each drug application, or each cell. It would also be informative to report whether UBCs characterized as responding with Fast, Mid-range, Slow, and OFF responses have different spontaneous firing rates or spontaneous firing patterns (regular vs irregular).

      Figure 1 shows examples of how Fast, Mid-range, Slow, and OFF UBCs respond to in vivo MF firing patterns, but lacks a summary of how the input is transformed across a population of UBCs. In panel d, it looks as if the phase of firing becomes more delayed across the examples from Fast to OFF UBCs. Quantifying this input/output relationship more thoroughly would strengthen these results.

      Inhibition was pharmacologically blocked in these studies. Golgi cells and other inhibitory interneurons likely contribute to how UBCs transform input signals. Speculation of how GABAergic and glycinergic synaptic inhibition may contribute additional context to help readers understand how a circuit with intact inhibition may behave.

    1. eLife Assessment

      This study provides important morphological observations related to the potential roles of Hebbian plasticity in establishing brain connectivity, by examining synapses formed by functionally distinct groups of retinal ganglion cell (RGC) axons in albino mouse dorsolateral geniculate nucleus (dLGN). Here, inappropriately projecting contralateral RGCs undergo developmental rewiring alongside ipsilateral RGCs, such that Hebbian theory would predict them to have separate synaptic targets. The authors provide compelling support for some presence of Hebbian rewiring, using combined confocal imaging and serial electron microscopy (EM) reconstructions to show that contralateral RGCs form completely segregated synaptic inputs onto islands of dLGN thalamocortical neurons, as well as somewhat segregated synaptic input onto local inhibitory interneurons. These findings will be of interest to researchers studying synaptic connectivity and plasticity during development.

    2. Reviewer #1 (Public review):

      Summary:

      The authors examined whether aberrantly projecting retinal ganglion cells in albino mice innervate a separate population of thalamocortical neurons, as would be predicted for Hebbian learning rules. The authors find support for this hypothesis in correlated light and electron microscopy (CLEM) reconstructions of retinal ganglion cell axons and thalamocortical neurons. In a second line of investigation, the authors ask the same question about retinal ganglion cell innervation of local inhibitory interneurons of the mouse LGN. The authors conclude that these connections are less specific.

      Strengths:

      The authors make good use of CLEM to test a circuit-level hypothesis, and they find an interesting difference in RGC synaptic innervation patterns for thalamocortical neurons vs. local interneurons.

      Weaknesses:

      The conclusions about the local interneuron innervation are a little more difficult to interpret. One would expect to only capture a small part of the local interneuron dendritic field, as compared to the smaller thalamocortical neurons, right? Doesn't that imply that finding some evidence of promiscuous connectivity means that other dendrites that were not observed probably connect to many different RGCs?

    3. Reviewer #2 (Public review):

      In this article, the authors examined the organization of misplaced retinal inputs in the visual thalamus of albino mice at electron-microscopic (EM) resolution to determine whether these synaptic inputs are segregated from the rest of the retinogeniculate circuitry.

      The study's major strengths include its high resolution, achieved through serial EM and confocal microscopy, which enabled the identification of all synaptic inputs onto neurons in the dorsolateral geniculate nucleus (dLGN).

      The experiments are very precise and demanding; thus, only the synaptic inputs of a few neurons were fully reconstructed in one animal. A few figures could be improved in their presentation.

      Despite this, the authors clearly demonstrate the synaptic segregation of misrouted retinal axons onto dLGN neurons, separate from the rest of the retinogeniculate circuitry.

      This finding is impactful because retinal inputs typically do not segregate within the mouse dLGN, and it was previously thought that this was due to the nucleus's small size, which might prevent proper segregation. The study shows that in cases where axons are misrouted and exhibit a different activity pattern than surrounding retinal inputs, segregation of inputs can indeed occur. This suggests that the normal system has the capacity to segregate inputs, despite the limited volume of the mouse dLGN.

    1. eLife Assessment

      This valuable study compares ChIP-seq and ChEC-seq2 techniques to investigate RNA polymerase II (RNAPII) binding patterns in yeast, revealing that ChEC-seq2 captures distinct regulatory events associated with active transcription missed by ChIP-seq. The authors use ChEC-seq2 data to build a stochastic model of RNAPII kinetics, providing convincing new insights into transcription regulation and the role of the nuclear pore complex. The paper highlights the importance of careful methodological comparisons in understanding RNAPII dynamics.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors use ChEC-seq, an MNase based method to map yeast RNA pol II. Part of the reasoning for this study is that earlier biochemical work suggested pol II initiation and termination should involve slow steps at the UAS/promoter and termination regions that are not well visualized by formaldehyde-based ChIP methods. Here the authors find that pol II ChIP and ChEC give complementary patterns. Pol II ChIP signals are strongest in the coding region (where ChIP signal correlates well with transcription (rho = 0.62)). In contrast, pol II ChEC signals are strongest at promoters (rho = 0.52) and terminator regions. Weaker upstream ChEC signals are also observed at the STM class genes where biochemical studies have suggested a form of Pol (and maybe other general factors) is recruited to UAS sites. ChEC of TFIIA and TFIIE give promoter-specific ChEC signals as expected. Extending this work to elongation factors Ctk1 and Spt5 unexpectedly give strong signals near the PIC location and little signals over the coding region. This, and mapping CTD S2 and S5 phosphorylation by ChEC suggests to me that, for some reason, ChEC isn't optimal for detecting components of the elongation complex over coding regions.

      Examples are also presented where perturbations of transcription can be measured by ChEC. Modeling studies are shown where adjustment of kinetic parameters agree well with ChEC data and that these models can be used to estimate which steps in transcription are affected by various perturbations. However, no tests were performed to see if the predictions could be validated by other means. Finally, the role of nuclear pore binding by Gcn4 is explored, although the effects are small and this proposal should be explored more completely in future studies. Overall, the authors show that pol II ChEC is a valuable and complementary method for investigating transcription mechanisms and slow steps at the initiation and termination regions.

    3. Reviewer #2 (Public review):

      Summary:

      The study by VanBalzen et. al. compares chromatin immunoprecipitation (ChIP-seq) and chromatin endogenous cleavage sequencing (ChEC-seq2) to examine RNA polymerase II (RNAPII) binding patterns in yeast. While ChIP-seq shows RNAPII enrichment mainly over transcribed regions, ChEC-seq2 highlights RNAPII binding at promoters and upstream activating sequences (UASs), suggesting it captures distinct RNAPII populations that the authors speculate are linked more tightly to active transcription. The authors develop a stochastic model for RNAPII kinetics using ChEC-seq2 data, revealing insights into transcription regulation and the role of the nuclear pore complex in stabilizing promoter-associated RNAPII. The study suggests that ChEC-seq2 identifies regulatory events that ChIP-seq may overlook.

      Strengths:

      (1) This is a carefully crafted study that adds significantly to existing literature in this area. Transgenic MNase fusions with endogenous Rpb1 and Rpb3 subunits were carefully performed, and complemented by fusions with several additional proteins that help the authors to dissect the transcription cycle. Both the S. cerevisiae lines and the sequencing data are likely to be of significant use to the community

      (2) The validation of ChEC-seq2 and its comparison with ChIP-seq is highly valuable technical information for the community.

      (3) The kinetic modeling appears to be thoughtfully done.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Summary of revisions

      Title

      We have changed the title of the manuscript to “Chromatin endogenous cleavage provides a global view of yeast RNA polymerase II transcription kinetics”.

      Text

      Additional discussion of the patterns for elongation factors added (detailed below).

      Small text changes throughout, as mentioned in the detailed response below.

      Figures

      Updated legend-image in Figure 2F to reflect correct colors

      Added Figure 2 – supplement 1F – RNAPII enrichment with shorter promoter dwell times

      Added Figure 2 - supplement 2 with ChIP-seq outcomes (and text legend)

      Removed gene numbers in Figure 5C and put them in the legend.

      Substituted Med1 and Med8 ChEC over Rap1 sites in Figure 5F.

      Moved kin28-is growth inhibition to Figure 5 – Supplement 1.

      Substituted a new panel overlaying the RNAPII enrichment over UASs or promoters for all three strains in Figure 7D.

      Improved the labeling and legend of Figure 7E

      Methods

      Added ChIP-seq performed to confirm that the MNase fusion proteins are able to produce the expected pattern for ChIP.

      Point-by-point response to reviewers’ comments

      Reviewer 1:

      (1) Extending this work to elongation factors Ctk1 and Spt5 unexpectedly give strong signals near the PIC location and little signals over the coding region. This, and mapping CTD S2 and S5 phosphorylation by ChEC suggests to me that, for some reason, ChEC isn't optimal for detecting components of the elongation complex over coding regions. 

      (3) mapping the elongation factors Spt5 and Ctk1 by ChEC gives unexpected results as the signals over the coding sequences appear weak but unexpectedly strong at promoters and terminators. It would be helpful if the authors could comment on reasons why ChEC may not work well with elongation factors. For example, could this be something to do with the speed of Pol elongation and/or the chromatin structure of coding sequences such that coding sequence DNA is less accessible to MNase cleavage? 

      (7) The mintbodys are an interesting attempt to measure Pol II CTD modifications during elongation but give unexpected results as the signals in the coding region are lower than at promoters and terminators. It seems like ChIP is still a much better option for elongation factors unless I'm missing something. 

      We agree with the reviewer that this is a point that could confuse the reader.  Therefore, we have devoted two additional paragraphs to possible interpretations of our data in the Discussion:

      ChEC with factors involved in elongation (Ctk1, Spt5, Ser2p-RNAPII), when normalized to total RNAPII, showed greater enrichment over the CDS (Figure 3G), as expected. However, it is surprising that we also observed clear enrichment of these factors at promoters (e.g. Figure 3A, E & F). The association of elongation factors with the promoter seems to be biologically relevant. Changes in transcription correlate with changes in ChEC enrichment for these factors and modifications (Figure 4C). Blocking initiation by inhibiting TFIIH kinase led to a reduction of Ser5p RNAPII and Ser2p RNAPII over both the promoter and the transcribed region (Figure 5G). This suggests either that the true signal of these factors over transcribed regions is less evident by ChEC than by ChIP or that ChEC can reveal interactions of elongation factors at early stages of transcription that are missed by ChIP. The expectations for enrichment of elongation factors and phosphorylated CTD are largely based on ChIP data. Because ChIP fails to capture RNAPII enrichment at UASs and promoters, it is possible that ChIP also fails to capture promoter interaction of factors involved in elongation as well.

      Factors important for elongation can also function at the promoter. For example, Ctk1 is required for the dissociation of basal transcription factors from RNAPII at the promoter (Ahn et al., 2009). Transcriptional induction leads to increases in Ctk1 ChEC enrichment both over the promoter and over the 3’ end of the transcribed region (Figure 4C). Dynamics of Spt4/5 association with RNAPII from in vitro imaging (Rosen et al., 2020) indicate that the majority of Spt4/5 binding to RNAPII does not lead to elongation; Spt4/5 frequently dissociates from DNA-bound RNAPII. Association of Spt4/5 with RNAPII may represent a slow, inefficient step in the transition to productive elongation. If so, then ChEC-seq2 may capture transient Spt4/5 interactions that occur prior to productive elongation, producing enrichment of Spt5 at the promoter.

      (2) Finally, the role of nuclear pore binding by Gcn4 is explored, although the results do not seem convincing (10) In Figure 7, it's not convincing to me that ChEC is revealing the reason for the transcriptional defect in the Gcn4 PD mutant. The plots in panel D look nearly the same and I don't follow the authors' description of the differences stated in the text. In panel A, replotting the data in some other way might make the transcriptional differences between WT and Gcn4 PD mutants more obvious. 

      The phenotype of the gcn4-pd mutant is a quantitative decrease in transcription and this leads to a quantitative decrease, rather than qualitative loss, of RNA polymerase II over the promoter, without impacting the association of RNA polymerase II over the UAS region. This effect is small but statistically significant (p = 4e5). We have changed the title of this section of the manuscript to “ChEC-seq2 suggests a role for the NPC in stabilizing promoter association of RNAPII”. Also, to make comparison clearer, we have plotted the data together in the revised figure (Figure 7D).

      The magnitude of the decrease is not large, but we would highlight that is almost as large as that produced by inhibiting the Kin28 kinase (Figure 5H). Because the promoter-bound RNAPII is poorly captured by ChIP, this effect might be difficult to observe by techniques other than ChEC. Obviously, more mechanistic studies will need to be performed to fully understand this phenotype, but this result supports a role for the interaction with the nuclear pore complex in either enhancing the transfer of RNA polymerase II from the enhancer to the promoter or in preventing its dissociation from the promoter.

      I think that the related methods cut&run/cut&tag have been used to map elongating pol II. The authors should summarize what is known from this approach in the introduction and/or discussion. 

      CUT&RUN has been used to map RNAPII in mammals, but we are not aware of reports in S. cerevisiae.  Work from the Henikoff Lab in yeast mapped transcription factors and histone modifications (PMIDs 28079019 and 31232687).  A report using CUT&RUN in a human cell line reported a promoter-5’ bias of RNAPII that appeared to be dependent on fragment length (PMID 33070289). Regardless, the report highlights a key distinction between yeast and other eukaryotes: paused RNAPII. Indeed, paused RNAPII dominates ChIP-seq tracks in metazoans, and so we are hesitant to speculate between CUT&RUN in other species vs. ChEC-seq2 in S. cerevisiae

      Are the Rpb1, Rpb3, TFIIA, and TFIIE cleavage patterns expected based on the known structure of the PIC (Figures 2C, E)? 

      Rpb1 and 3 show peaks at approximately -17 and +34 with respect to TATA. TFIIA (Toa2) shows peaks at -12 and + 12.  And TFIIE (Tfa1) shows a peak around +34 (Figure 2C & E):

      As shown in the supplementary movie (based on the cMed-PIC structure; PDB #5OQM; Schilbach et al., 2017), upon binding to TBP/TFIID, TFIIA would be expected to cleave slightly upstream and downstream of the protected TATA (-12 and +12), while TFIIE binds downstream after the +12 site is protected and would be closest to the +34 unprotected site (to the right in the image below). RNAPII, which binds the fully assembled PIC, should be able to access either the upstream site (-12) or the downstream site (+34). Rpb1’s unstructured carboxy terminal domain, to which MNase is fused, would give it maximum flexibility, which likely explains why Rpb1 cleaves both at -12 and +34, with a preference for -12. Rpb3 also cleaves both sites, but without an obvious preference. 

      Author response image 1.

      Author response image 2.

      cleavage at -12, +12 and +34

      Author response image 3.

      Highlighted sites corresponding to the peaks in TFIIA assembled with TBP:

      Author response image 4.

      The complete PIC, protecting the +12 site, but leaving the +34 site exposed: 

      (6) Figure 2 S1: Pol II ChIP in the coding region gives a better correlation with transcription vs ChEC in promoters. Also, Pol II ChIP at terminators is almost as good as ChEC at promoters for estimating transcription. This latter point seems at odds with the text. The authors should comment on this and modify the text as needed. 

      Thank you for this comment.  We have clarified the text.

      In Figures 4 and 5, it's hard to tell how well changes in transcription correlate with changes in Pol II ChEC signals. It might be helpful to have a scatterplot or some other type of plot so that this relationship can be better evaluated. 

      While we find corresponding increase/decrease in ChEC-seq2 signal in genes identified as up/downregulated by SLAM-seq, the magnitude in change is not well correlated between the two techniques.  This was not surprising, because neither ChIP nor ChEC correlate especially well with SLAM-seq (Figure 2 – supplement 1E).

      In Figure 5, it's unclear why Pol association with Rap1 is being measured. Buratowski/Gelles showed that Pol associates with strong acidic activators - presumably through Mediator. Rap1 supposedly does not bind Mediator - so how is Pol associating here? Perhaps it would be better to measure Pol binding at STM genes that show Mediator-UAS binding. 

      Thank you; this is a good point.  We chose Rap1 because we had generated high-confidence binding sites in our strains under these conditions by ChEC-seq2. The results suggest that RNAPII is recruited well to these sites and that this recruitment does not require TFIIB. However, in disagreement with the notion that Mediator does not interact with Rap1, ChEC with Mediator subunits Med1 and Med8 also show peaks at these sites (new Figure 5F; the old Figure 5F is now Figure 5 – Supplement 1).  Therefore, either these sites are co-occupied by other transcription factors that mind Mediator, or Mediator is recruited by Rap1.  In either case, this correlates with binding of RNAPII. 

      Reviewer 2:

      (1) The term "nascent transcription" is all too often used interchangeably for NET-seq, PRO-seq, 4sUseq, and other assays that often provide different types of information. The authors should make it clear their use of the term refers to SLAM-seq data. 

      We have clarified throughout the manuscript that nascent transcription measured by SLAM-seq.

      The authors should explicitly state that experiments were performed in S. cerevisiae in the Results section. 

      We have made it clear in the title and the text that these experiments were performed in S. cerevisiae.

      Lines 216-218 state that "None of the 24 predicted the strong signal over the transcribed region with promoter depletion characteristic of ChIP-seq". I understand the authors' point, but there are parameter combinations that produce a flat profile with slightly less signal over the promoter (e.g., 5 sec dwell times and 3000 bp/ min elongation rate). If flanking windows were included, this profile would look something like ChIP-seq. I'd encourage the authors to be more precise with their language. 

      Thank you for highlighting this over-statement.

      We have now clarified the text and added another supplementary panel as follows:

      “While some combinations predicted a relatively flat distribution across the gene with lower levels in the promoter, none of the 24 predicted the strong signal over the transcribed region with promoter depletion characteristic of ChIP-seq. Only very short promoter dwell times (i.e., < 1s), produced the low promoter occupancy seen in ChIP-seq (Figure 2 – supplement 1F).”

    1. eLife Assessment

      This work presents an important method for depleting ribosomal RNA from bacterial single-cell RNA sequencing libraries, enabling the study of cellular heterogeneity within microbial biofilms. The approach convincingly identifies a small subpopulation of cells at the biofilm's base with upregulated PdeI expression, offering invaluable insights into the biology of bacterial biofilms and the formation of persister cells. Further integrated analysis of gene interactions within these datasets could deepen our understanding of biofilm dynamics and resilience.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Yan and colleagues introduce a modification to the previously published PETRI-seq bacterial single cell protocol to include a ribosomal depletion step based on a DNA probe set that selectively hybridizes with ribosome-derived (rRNA) cDNA fragments. They show that their modification of the PETRI-seq protocol increases the fraction of informative non-rRNA reads from ~4-10% to 54-92%. The authors apply their protocol to investigating heterogeneity in a biofilm model of E. coli, and convincingly show how their technology can detect minority subpopulations within a complex community.

      Strengths:

      The method the authors propose is a straightforward and inexpensive modification of an established split-pool single cell RNA-seq protocol that greatly increases its utility, and should be of interest to a wide community working in the field of bacterial single cell RNA-seq.

    3. Reviewer #2 (Public review):

      Summary:

      This work introduces a new method of depleting the ribosomal reads from the single-cell RNA sequencing library prepared with one of the prokaryotic scRNA-seq techniques, PETRI-seq. The advance is very useful since it allows broader access to the technology by lowering the cost of sequencing. It also allows more transcript recovery with fewer sequencing reads. The authors demonstrate the utility and performance of the method for three different model species and find a subpopulation of cells in the E.coli biofilm that express a protein, PdeI, which causes elevated c-di-GMP levels. These cells were shown to be in a state that promotes persister formation in response to ampicillin treatment.

      Strengths:

      The introduced rRNA depletion method is highly efficient, with the depletion for E.coli resulting in over 90% of reads containing mRNA. The method is ready to use with existing PETRI-seq libraries which is a large advantage, given that no other rRNA depletion methods were published for split-pool bacterial scRNA-seq methods. Therefore, the value of the method for the field is high. There is also evidence that a small number of cells at the bottom of a static biofilm express PdeI which is causing the elevated c-di-GMP levels that are associated with persister formation. This finding highlights the potentially complex role of PdeI in regulation of c-di-GMP levels and persister formation in microbial biofilms.

      Weaknesses:

      Given many current methods that also introduce different techniques for ribosomal RNA depletion in bacterial single-cell RNA sequencing, it is unclear what is the place and role of RiboD-PETRI. The efficiency of rRNA depletion varies greatly between species for the majority of the available methods, so it is not easy to select the best fitting technique for a specific application.

      Despite transcriptome-wide coverage, the authors focused on the role of a single heterogeneously expressed gene, PdeI. A more integrated analysis of multiple genes and\or interactions between them using these data could reveal more insights into the biofilm biology.

      The authors should also present the UMIs capture metrics for RiboD-PETRI method for all cells passing initial quality filter (>=15 UMIs/cell) both in the text and in the figures. Selection of the top few cells with higher UMI count may introduce biological biases in the analysis (the top 5% of cells could represent a distinct subpopulation with very high gene expression due to a biological process). For single-cell RNA sequencing, showing the statistics for a 'top' group of cells creates confusion and inflates the perceived resolution, especially when used to compare to other methods (e.g. the parent method PETRI-seq itself).

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      The work introduces a valuable new method for depleting the ribosomal RNA from bacterial single-cell RNA sequencing libraries and shows that this method is applicable to studying the heterogeneity in microbial biofilms. The evidence for a small subpopulation of cells at the bottom of the biofilm which upregulates PdeI expression is solid. However, more investigation into the unresolved functional relationship between PdeI and c-di-GMP levels with the help of other genes co-expressed in the same cluster would have made the conclusions more significant. 

      Many thanks for eLife’s assessment of our manuscript and the constructive feedback. We are encouraged by the recognition of our bacterial single-cell RNA-seq methodology as valuable and its efficacy in studying bacterial population heterogeneity. We appreciate the suggestion for additional investigation into the functional relationship between PdeI and c-di-GMP levels. We concur that such an exploration could substantially enhance the impact of our conclusions. To address this, we have implemented the following revisions: We have expanded our data analysis to identify and characterize genes co-expressed with PdeI within the same cellular cluster (Fig. 3F, G, Response Fig. 10); We conducted additional experiments to validate the functional relationships between PdeI and c-di-GMP, followed by detailed phenotypic analyses (Response Fig. 9B). Our analysis reveals that while other marker genes in this cluster are co-expressed, they do not significantly impact biofilm formation or directly relate to c-di-GMP or PdeI. We believe these revisions have substantially enhanced the comprehensiveness and context of our manuscript, thereby reinforcing the significance of our discoveries related to microbial biofilms. The expanded investigation provides a more thorough understanding of the PdeI-associated subpopulation and its role in biofilm formation, addressing the concerns raised in the initial assessment.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In this manuscript, Yan and colleagues introduce a modification to the previously published PETRI-seq bacterial single-cell protocol to include a ribosomal depletion step based on a DNA probe set that selectively hybridizes with ribosome-derived (rRNA) cDNA fragments. They show that their modification of the PETRI-seq protocol increases the fraction of informative non-rRNA reads from ~4-10% to 54-92%. The authors apply their protocol to investigating heterogeneity in a biofilm model of E. coli, and convincingly show how their technology can detect minority subpopulations within a complex community. 

      Strengths: 

      The method the authors propose is a straightforward and inexpensive modification of an established split-pool single-cell RNA-seq protocol that greatly increases its utility, and should be of interest to a wide community working in the field of bacterial single-cell RNA-seq. 

      Weaknesses: 

      The manuscript is written in a very compressed style and many technical details of the evaluations conducted are unclear and processed data has not been made available for evaluation, limiting the ability of the reader to independently judge the merits of the method. 

      Thank you for your thoughtful and constructive review of our manuscript. We appreciate your recognition of the strengths of our work and the potential impact of our modified PETRI-seq protocol on the field of bacterial single-cell RNA-seq. We are grateful for the opportunity to address your concerns and improve the clarity and accessibility of our manuscript.

      We acknowledge your feedback regarding the compressed writing style and lack of technical details, which are constrained by the requirements of the Short Report format in eLife. We have addressed these issues in our revised manuscript as follows:

      (1) Expanded methodology section: We have provided a more comprehensive description of our experimental procedures, including detailed protocols for the ribosomal depletion step (lines 435-453) and data analysis pipeline (lines 471-528). This will enable readers to better understand and potentially replicate our methods.

      (2) Clarification of technical evaluations: We have elaborated on the specifics of our evaluations, including the criteria used for assessing the efficiency of ribosomal depletion (lines 99-120), and the methods employed for identifying and characterizing subpopulations (lines 155-159, 161-163 and 163-167).

      (3) Data availability: We apologize for the oversight in not making our processed data readily available. We have deposited all relevant datasets, including raw and source data, in appropriate public repositories (GEO: GSE260458) and provide clear instructions for accessing this data in the revised manuscript.

      (4) Supplementary information: To maintain the concise nature of the main text while providing necessary details, we have included additional supplementary information. This will cover extended methodology (lines 311-318, 321-323, 327-340, 450-453, 533, and 578-589), detailed statistical analyses (lines 492-493, 499-501 and 509-528), and comprehensive data tables to support our findings.

      We believe these changes significantly improved the clarity and reproducibility of our work, allowing readers to better evaluate the merits of our method.

      Reviewer #2 (Public Review): 

      Summary: 

      This work introduces a new method of depleting the ribosomal reads from the single-cell RNA sequencing library prepared with one of the prokaryotic scRNA-seq techniques, PETRI-seq. The advance is very useful since it allows broader access to the technology by lowering the cost of sequencing. It also allows more transcript recovery with fewer sequencing reads. The authors demonstrate the utility and performance of the method for three different model species and find a subpopulation of cells in the E.coli biofilm that express a protein, PdeI, which causes elevated c-di-GMP levels. These cells were shown to be in a state that promotes persister formation in response to ampicillin treatment. 

      Strengths: 

      The introduced rRNA depletion method is highly efficient, with the depletion for E.coli resulting in over 90% of reads containing mRNA. The method is ready to use with existing PETRI-seq libraries which is a large advantage, given that no other rRNA depletion methods were published for split-pool bacterial scRNA-seq methods. Therefore, the value of the method for the field is high. There is also evidence that a small number of cells at the bottom of a static biofilm express PdeI which is causing the elevated c-di-GMP levels that are associated with persister formation. Given that PdeI is a phosphodiesterase, which is supposed to promote hydrolysis of c-di-GMP, this finding is unexpected. 

      Weaknesses: 

      With the descriptions and writing of the manuscript, it is hard to place the findings about the PdeI into existing context (i.e. it is well known that c-di-GMP is involved in biofilm development and is heterogeneously distributed in several species' biofilms; it is also known that E.coli diesterases regulate this second messenger, i.e. https://journals.asm.org/doi/full/10.1128/jb.00604-15). 

      There is also no explanation for the apparently contradictory upregulation of c-di-GMP in cells expressing higher PdeI levels. Perhaps the examination of the rest of the genes in cluster 2 of the biofilm sample could be useful to explain the observed association. 

      Thank you for your thoughtful and constructive review of our manuscript. We are pleased that the reviewer recognizes the value and efficiency of our rRNA depletion method for PETRI-seq, as well as its potential impact on the field. We would like to address the points raised by the reviewer and provide additional context and clarification regarding the function of PdeI in c-di-GMP regulation.

      We acknowledge that c-di-GMP’s role in biofilm development and its heterogeneous distribution in bacterial biofilms are well studied. We appreciate the reviewer's observation regarding the seemingly contradictory relationship between increased PdeI expression and elevated c-di-GMP levels. This is indeed an intriguing finding that warrants further explanation.

      PdeI is predicted to function as a phosphodiesterase involved in c-di-GMP degradation, based on sequence analysis demonstrating the presence of an intact EAL domain, which is known for this function. However, it is important to note that PdeI also harbors a divergent GGDEF domain, typically associated with c-di-GMP synthesis. This dual-domain structure indicates that PdeI may play complex regulatory roles. Previous studies have shown that knocking out the major phosphodiesterase PdeH in E. coli results in the accumulation of c-di-GMP. Moreover, introducing a point mutation (G412S) in PdeI's divergent GGDEF domain within this PdeH knockout background led to decreased c-di-GMP levels2. This finding implies that the wild-type GGDEF domain in PdeI contributes to maintaining or increasing cellular c-di-GMP levels.

      Importantly, our single-cell experiments demonstrated a positive correlation between PdeI expression levels and c-di-GMP levels (Figure 4D). In this revision, we also constructed a PdeI(G412S)-BFP mutation strain. Notably, our observations of this strain revealed that c-di-GMP levels remained constant despite an increase in BFP fluorescence, which serves as a proxy for PdeI(G412S) expression levels (Figure 4D). This experimental evidence, coupled with domain analyses, suggests that PdeI may also contribute to c-di-GMP synthesis, rebutting the notion that it acts solely as a phosphodiesterase. HPLC LC-MS/MS analysis further confirmed that the overexpression of PdeI, induced by arabinose, resulted in increased c-di-GMP levels (Fig. 4E) . These findings strongly suggest that PdeI plays a pivotal role in upregulating c-di-GMP levels.

      Our further analysis indicated that PdeI contains a CHASE (cyclases/histidine kinase-associated sensory) domain. Combined with our experimental results showing that PdeI is a membrane-associated protein, we hypothesize that PdeI acts as a sensor, integrating environmental signals with c-di-GMP production under complex regulatory mechanisms.

      We understand your interest in the other genes present in cluster 2 of the biofilm and their potential relationship to PdeI and c-di-GMP. Upon careful analysis, we have determined that the other marker genes in this cluster do not significantly impact biofilm formation, nor have we identified any direct relationship between these genes, c-di-GMP, or PdeI. Our focus on PdeI within this cluster is justified by its unique and significant role in c-di-GMP regulation and biofilm formation, as demonstrated by our experimental results. While other genes in this cluster may be co-expressed, their functions appear unrelated to the PdeI-c-di-GMP pathway we are investigating. Therefore, we opted not to elaborate on these genes in our main discussion, as they do not contribute directly to our understanding of the PdeI-c-di-GMP association. However, we can include a brief mention of these genes in the manuscript, indicating their lack of relevance to the PdeI-c-di-GMP pathway. This addition will provide a more comprehensive view of the cluster's composition while maintaining our focus on the key findings related to PdeI and c-di-GMP.

      We have also included the aforementioned explanations and supporting experimental data within the manuscript to clarify this important point (lines 193-217). Thank you for highlighting this apparent contradiction, allowing us to provide a more detailed explanation of our findings.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Overall, I found the main text of the manuscript well written and easy to understand, though too compressed in parts to fully understand the details of the work presented, some examples are outlined below. The materials and methods appeared to be less carefully compiled and could use some careful proof-reading for spelling (e.g. repeated use of "minuts" for minutes, "datas" for data) and grammar and sentence fragments (e.g. "For exponential period E. coli data." Line 333). In general, the meaning is still clear enough to be understood. I also was unable to find figure captions for the supplementary figures, making these difficult to understand. 

      We appreciate your careful review, which has helped us improve the clarity and quality of our manuscript. We acknowledge that some parts of the main text may have been overly compressed due to Short Report format in eLife. We have thoroughly reviewed the manuscript and expanded on key areas to provide more comprehensive explanations. We have carefully revised the Materials and Methods section to address the following: Corrected all spelling and grammatical error, including "minuts" to "minutes" and "datas" to "data". Corrected grammatical issues and sentence fragments throughout the section. We sincerely apologize for the omission of captions for the supplementary figures. We have now added detailed captions for all supplementary figures to ensure they are easily understandable. We believe these revisions address your concerns and enhance the overall readability and comprehension of our work.

      General comments: 

      (1) To evaluate the performance of RiboD-PETRI, it would be helpful to have more details in general, particularly to do with the development of the sequencing protocol and the statistics shown. Some examples: How many reads were sequenced in each experiment? Of these, how many are mapped to the bacterial genome? How many reads were recovered per cell? Have the authors performed some kind of subsampling analysis to determine if their sequencing has saturated the detection of expressed genes? The authors show e.g. correlations between classic PETRI-seq and RiboD-PETRI for E. coli in Figure 1, but also have similar data for C. crescentus and S. aureus - do these data behave similarly? These are just a few examples, but I'm sure the authors have asked themselves many similar questions while developing this project; more details, hard numbers, and comparisons would be very much appreciated. 

      Thank you for your valuable feedback. To address your concerns, we have added a table in the supplementary material that clarifies the details of sequencing.

      The correlation values of PETRI-seq and RiboD-PETRI data in C. crescentus are relatively good. However, the correlation values between PETRI-seq and RiboD-PETRI data in SA data are relatively less high. The reason is that the sequencing depths of RiboD-PETRI and PETRI-seq are different, resulting in much higher gene expression in the RiboD-PETRI sequencing results than in PETRI-seq, and the calculated correlation coefficient is only about 0.47. This indicates that there is some positive correlation between the two sets of data, but it is not particularly strong. This indicates that there is a certain positive correlation between these two sets of data, but it is not particularly strong. However, we have counted the expression of 2763 genes in total, and even though the calculated correlation coefficient is relatively low, it still shows that there is some consistency between the two groups of samples.

      Author response image 1.

      Assessment of the effect of rRNA depletion on transcriptional profiles of (A) C. crescentus (CC) and (B) S. aureus (SA) . The Pearson correlation coefficient (r) of UMI counts per gene (log2 UMIs) between RiboD-PETRI and PETRI-seq was calculated for 4097 genes (A) and 2763 genes (B). The "ΔΔ" label represents the RiboD-PETRI protocol; The "Ctrl" label represents the classic PETRI-seq protocol we performed. Each point represents a gene.

      (2) Additionally, I think it is critical that the authors provide processed read counts per cell and gene in their supplementary information to allow others to investigate the performance of their method without going back to raw FASTQ files, as this can represent a significant hurdle for reanalysis. 

      Thank you for your suggestion. However, it's important to clarify that reads and UMIs (Unique Molecular Identifiers) are distinct concepts in single-cell RNA sequencing. Reads can be influenced by PCR amplification during library construction, making their quantity less stable. In contrast, UMIs serve as a more reliable indicator of the number of mRNA molecules detected after PCR amplification. Throughout our study, we primarily utilized UMI counts for quantification. To address your concern about data accessibility, we have included the UMI counts per cell and gene in our supplementary materials provided above (Table S7-15. Some of the files are too large in memory and are therefore stored in GEO: GSE260458). This approach provides a more accurate representation of gene expression levels and allows for robust reanalysis without the need to process raw FASTQ files.

      (3) Finally, the authors should also discuss other approaches to ribosomal depletion in bacterial scRNA-seq. One of the figures appears to contain such a comparison, but it is never mentioned in the text that I can find, and one could read this manuscript and come away believing this is the first attempt to deplete rRNA from bacterial scRNA-seq. 

      We have addressed this concern by including a comparison of different methods for depleting rRNA from bacterial scRNA-seq in Table S4 and make a short text comparison as follows: “Additionally, we compared our findings with other reported methods (Fig. 1B; Table S4). The original PETRI-seq protocol, which does not include an rRNA depletion step, exhibited an mRNA detection rate of approximately 5%. The MicroSPLiT-seq method, which utilizes Poly A Polymerase for mRNA enrichment, achieved a detection rate of 7%. Similarly, M3-seq and BacDrop-seq, which employ RNase H to digest rRNA post-DNA probe hybridization in cells, reported mRNA detection rates of 65% and 61%, respectively. MATQ-DASH, which utilizes Cas9-mediated targeted rRNA depletion, yielded a detection rate of 30%. Among these, RiboD-PETRI demonstrated superior performance in mRNA detection while requiring the least sequencing depth.” We have added this content in the main text (lines 110-120), specifically in relation to Figure 1B and Table S4. This addition provides context for our method and clarifies its position among existing techniques.

      Detailed comments: 

      Line 78: the authors describe the multiplet frequency, but it is not clear to me how this was determined, for which experiments, or where in the SI I should look to see this. Often this is done by mixing cultures of two distinct bacteria, but I see no evidence of this key experiment in the manuscript. 

      The multiplet frequency we discuss in the manuscript is not determined through experimental mixing of distinct bacterial cultures.The PETRI-seq and mirco-SPLIT articles have also done experiments mixing the two libraries to determine the single-cell rate, and both gave good results. Our technique is derived from these two articles (mainly PETRI-seq), and the biggest difference is the difference in the later RiboD part, so we did not do this experiment separately. So the multiple frequencies here are theoretical predictions based on our sequencing results, calculated using a Poisson distribution. We have made this distinction clearer in our manuscript (lines 93-97). The method is available in Materials and Methods section (lines 520-528). The data is available in Table S2. To elaborate:

      To assess the efficiency of single-cell capture in RiboD-PETRI, we calculated the multiplet frequency using a Poisson distribution based on our sequencing results

      (1) Definition: In our study, multiplet frequency is defined as the probability of a non-empty barcode corresponding to more than one cell.

      (2) Calculation Method: We use a Poisson distribution-based approach to calculate the predicted multiplet frequency. The process involves several steps:

      We first calculate the proportion of barcodes corresponding to zero cells: . Then, we calculate the proportion corresponding to one cell: . We derive the proportion for more than zero cells: P(≥1) = 1 - P(0). And for more than one cell: P(≥2) = 1 - P(1) - P(0). Finally, the multiplet frequency is calculated as:

      (3) Parameter λ: This is the ratio of the number of cells to the total number of possible barcode combinations. For instance, when detecting 10,000 cells, .

      Line 94: the concept of "percentage of gene expression" is never clearly defined. Does this mean the authors detect 99.86% of genes expressed in some cells? How is "expressed" defined - is this just detecting a single UMI? 

      The term "percentage gene expression" refers to the proportion of genes in the bacterial strain that were detected as expressed in the sequenced cell population. Specifically, in this context, it means that 99.86% of all genes in the bacterial strain were detected as expressed in at least one cell in our sequencing results. To define "expressed" more clearly: a gene is considered expressed if at least one UMI (Unique Molecular Identifier) detected in a cell in the population. This definition allows for the detection of even low-level gene expression. To enhance clarity in the manuscript, we have rephrased the sentence as “transcriptome-wide gene coverage across the cell population”.

      Line 98: The authors discuss the number of recovered UMIs throughout this paragraph, but there is no clear discussion of the number of detected expressed genes per cell. Could the authors include a discussion of this as well, as this is another important measure of sensitivity? 

      We appreciate your suggestion to include a discussion on the number of detected expressed genes per cell, as this is indeed another important measure of sensitivity. We would like to clarify that we have actually included statistics on the number of genes detected across all cells in the main text of our paper. This information is presented as percentages. However, we understand that you may be looking for a more detailed representation, similar to the UMI statistics we provided. To address this, we have now added a new analysis showing the number of genes detected per cell (lines 132-133, 138-139, 144-145 and 184-186, Fig. 2B, 3B and S2B). This additional result complements our existing UMI data and provides a more comprehensive view of the sensitivity of our method. We have included this new gene-per-cell statistical graph in the supplementary materials.

      Figure 1B: I presume ctrl and delta delta represent the classic PETRI-seq and RiboD protocols, respectively, but this is not specified. This should be clarified in the figure caption, or the names changed. 

      We appreciate you bringing this to our attention. We acknowledge that the labeling in the figure could have been clearer. We have now clarified this information in the figure caption. To provide more specificity: The "ΔΔ" label represents the RiboD-PETRI protocol; The "Ctrl" label represents the classic PETRI-seq protocol we performed. We have updated the figure caption to include these details, which should help readers better understand the protocols being compared in the figure.​

      Line 104: the authors claim "This performance surpassed other reported bacterial scRNA-seq methods" with a long number of references to other methods. "Performance" is not clearly defined, and it is unclear what the exact claim being made is. The authors should clarify what they're claiming, and further discuss the other methods and comparisons they have made with them in a thorough and fair fashion. 

      We appreciate your request for clarification, and we acknowledge that our definition of "performance" should have been more explicit. We would like to clarify that in this context, we define performance primarily in terms of the proportion of mRNA captured. Our improved method demonstrates a significantly higher rate of rRNA removal compared to other bacterial single-cell library construction methods. This results in a higher proportion of mRNA in our sequencing data, which we consider a key performance metric for single-cell RNA sequencing in bacteria. Additionally, when compared to our previous method, PETRI-seq, our improved approach not only enhances rRNA removal but also reduces library construction costs. This dual improvement in both data quality and cost-effectiveness is what we intended to convey with our performance claim.

      We recognize that a more thorough and fair discussion of other methods and their comparisons would be beneficial. We have summarized the comparison in Table S4 and make a short text discussion in the main text (lines 106-120). This addition provides context for our method and clarifies its position among existing techniques.

      Figure 1D: Do the authors have any explanation for the relatively lower performance of their C. crescentus depletion? 

      We appreciate your attention to detail and the opportunity to address this point. The lower efficiency of rRNA removal in C. crescentus compared to other species can be attributed to inherent differences between species. It's important to note that a single method for rRNA depletion may not be universally effective across all bacterial species due to variations in their genetic makeup and rRNA structures. Different bacterial species can have unique rRNA sequences, secondary structures, or associated proteins that may affect the efficiency of our depletion method. This species-specific variation highlights the challenges in developing a one-size-fits-all approach for bacterial rRNA depletion. While our method has shown high efficiency across several species, the results with C. crescentus underscore the need for continued refinement and possibly species-specific optimizations in rRNA depletion techniques. We thank you for bringing attention to this point, as it provides valuable insight into the complexities of bacterial rRNA depletion and areas for future improvement in our method.

      Line 118: The authors claim RiboD-PETRI has a "consistent ability to unveil within-population heterogeneity", however the preceding paragraph shows it detects potential heterogeneity, but provides no evidence this inferred heterogeneity reflects the reality of gene expression in individual cells. 

      We appreciate your careful reading and the opportunity to clarify this point. We acknowledge that our wording may have been too assertive given the evidence presented. We acknowledge that the subpopulations of cells identified in other species have not undergone experimental verification. Our intention in presenting these results was to demonstrate RiboD-PETRI's capability to detect “potential” heterogeneity consistently across different bacterial species, showcasing the method's sensitivity and potential utility in exploring within-population diversity. However, we agree that without further experimental validation, we cannot definitively claim that these detected differences represent true biological heterogeneity in all cases. We have revised this section to reflect the current state of our findings more accurately, emphasizing that while RiboD-PETRI consistently detects potential heterogeneity across species, further experimental validation would be required to confirm the biological significance of the observations (lines 169-171).

      Figure 1 H&I: I'm not entirely sure what I am meant to see in these figures, presumably some evidence for heterogeneity in gene expression. Are there better visualizations that could be used to communicate this? 

      We appreciate your suggestion for improving the visualization of gene expression heterogeneity. We have explored alternative visualization methods in the revised manuscript. Specifically, for the expression levels of marker genes shown in Figure 1H (which is Figure 2D now), we have created violin plots (Supplementary Fig. 4). These plots offer a more comprehensive view of the distribution of expression levels across different cell populations, making it easier to discern heterogeneity. However, due to the number of marker genes and the resulting volume of data, these violin plots are quite extensive and would occupy a significant amount of space. Given the space constraints of the main figure, we propose to include these violin plots as a Fig. S4 immediately following Figure 1 H&I (which is Figure 2D&E now). This arrangement will allow readers to access more detailed information about these marker genes while maintaining the concise style of the main figure.

      Regarding the pathway enrichment figure (Figure 2E), we have also considered your suggestion for improvement. We attempted to use a dot plot to display the KEGG pathway enrichment of the genes. However, our analysis revealed that the genes were only enriched in a single pathway. As a result, the visual representation using a dot plot still did not produce a particularly aesthetically pleasing or informative figure.

      Line 124: The authors state no significant batch effect was observed, but in the methods on line 344 they specify batch effects were removed using Harmony. It's unclear what exactly S2 is showing without a figure caption, but the authors should clarify this discrepancy. 

      We apologize for any confusion caused by the lack of a clear figure caption for Figure S2 (which is Figure S3D now). To address your concern, in addition to adding figure captions for supplementary figure, we would also like to provide more context about the batch effect analysis. In Supplementary Fig. S3, Panel C represents the results without using Harmony for batch effect removal, while Panel D shows the results after applying Harmony. In both panels A and B, the distribution of samples one and two do not show substantial differences. Based on this observation, we concluded that there was no significant batch effect between the two samples. However, we acknowledge that even subtle batch effects could potentially influence downstream analyses. Therefore, out of an abundance of caution and to ensure the highest quality of our results, we decided to apply Harmony to remove any potential minor batch effects. This approach aligns with best practices in single-cell analysis, where even small technical variations are often accounted for to enhance the robustness of the results.

      To improve clarity, we have revised our manuscript to better explain this nuanced approach: 1. We have updated the statement to reflect that while no major batch effect was observed, we applied batch correction as a precautionary measure (lines 181-182). 2. We have added a detailed caption to Figure S3, explaining the comparison between non-corrected and batch-corrected data. 3. We have modified the methods section to clarify that Harmony was applied as a precautionary step, despite the absence of obvious batch effects (lines 492-493).

      Figure 2D: I found this panel fairly uninformative, is there a better way to communicate this finding? 

      Thank you for your feedback regarding Figure 2D. We have explored alternative ways to present this information, using a dot plot to display the enrichment pathways, as this is often an effective method for visualizing such data. Meanwhile, we also provided a more detailed textual description of the enrichment results in the main text, highlighting the most significant findings.

      Figure 2I: the figure itself and caption say GFP, but in the text and elsewhere the authors say this is a BFP fusion. 

      We appreciate your careful review of our manuscript and figures. We apologize for any confusion this may have caused. To clarify: Both GFP (Green Fluorescent Protein) and BFP (Blue Fluorescent Protein) were indeed used in our experiments, but for different purposes: 1. GFP was used for imaging to observe location of PdeI in bacteria and persister cell growth, which is shown in Figure 4C and 4K. 2. BFP was used for cell sorting, imaging of location in biofilm, and detecting the proportion of persister cells which shown in Figure 4D, 4F-J. To address this inconsistency and improve clarity, we will make the following corrections: 1. We have reviewed the main text to ensure that references to GFP and BFP are accurate and consistent with their respective uses in our experiments. 2. We have added a note in the figure caption for Figure 4C to explicitly state that this particular image shows GFP fluorescence for location of PdeI. 3. In the methods section, we have provided a clear explanation of how both fluorescent proteins were used in different aspects of our study (lines 326-340).

      Line 156: The authors compare prices between RiboD and PETRI-seq. It would be helpful to provide a full cost breakdown, e.g. in supplementary information, as it is unclear exactly how the authors came to these numbers or where the major savings are (presumably in sequencing depth?) 

      We appreciate your suggestion to provide a more detailed cost breakdown, and we agree that this would enhance the transparency and reproducibility of our cost analysis. In response to your feedback, we have prepared a comprehensive cost breakdown that includes all materials and reagents used in the library preparation process. Additionally, we've factored in the sequencing depth (50G) and the unit price for sequencing (25¥/G). These calculations allow us to determine the cost per cell after sequencing. As you correctly surmised, a significant portion of the cost reduction is indeed related to sequencing depth. However, there are also savings in the library preparation steps that contribute to the overall cost-effectiveness of our method. We propose to include this detailed cost breakdown as a supplementary table (Table S6) in our paper. This table will provide a clear, itemized list of all expenses involved, including: 1. Reagents and materials for library preparation 2. Sequencing costs (depth and price per G) 3. Calculated cost per cell.

      Line 291: The design and production of the depletion probes are not clearly explained. How did the authors design them? How were they synthesized? Also, it appears the authors have separate probe sets for E. coli, C. crescentus, and S. aureus - this should be clarified, possibly in the main text.

      Thank you for your important questions regarding the design and production of our depletion probes. We included the detailed probe information in Supplementary Table S1, however, we didn’t clarify the information in the main text due to the constrains of the requirements of the Short Report format in eLife. We appreciate the opportunity to provide clarifications. ​

      The core principle behind our probe design is that the probe sequences are reverse complementary to the r-cDNA sequences. This design allows for specific recognition of r-cDNA. The probes are then bound to magnetic beads, allowing the r-cDNA-probe-bead complexes to be separated from the rest of the library. To address your specific questions: 1. Probe Design: We designed separate probe sets for E. coli, C. crescentus, and S. aureus. Each set was specifically constructed to be reverse complementary to the r-cDNA sequences of its respective bacterial species. This species-specific approach ensures high efficiency and specificity in rRNA depletion for each organism. The hybrid DNA complex wasthen removed by Streptavidin magnetic beads. 2. Probe Synthesis: The probes were synthesized based on these design principles. 3. Species-Specific Probe Sets: You are correct in noting that we used separate probe sets for each bacterial species. We have clarified this important point in the main text to ensure readers understand the specificity of our approach. To further illustrate this process, we have created a schematic diagram showing the principle of rRNA removal and clarified the design principle in figure legend, which we have included in the figure legend of Fig. 1A.

      Line 362: I didn't see a description of the construction of the PdeI-BFP strain, I assume this would be important for anyone interested in the specific work on PdeI. 

      Thank you for your astute observation regarding the construction of the PdeI-BFP strain. We appreciate the opportunity to provide this important information. The PdeI-BFP strain was constructed as follows: 1. We cloned the pdeI gene along with its native promoter region (250bp) into a pBAD vector. 2. The original promoter region of the pBAD vector was removed to avoid any potential interference. 3. This construction enables the expression of the PdeI-BFP fusion protein to be regulated by the native promoter of pdeI, thus maintaining its physiological control mechanisms. 4. The BFP coding sequence was fused to the pdeI gene to create the PdeI-BFP fusion construct. We have added a detailed description of the PdeI-BFP strain construction to our methods section (lines 327-334).

      Reviewer #2 (Recommendations For The Authors): 

      (1) General remarks: 

      Reconsider using 'advanced' in the title. It is highly generic and misleading. Perhaps 'cost-efficient' would be a more precise substitute. 

      Thank you for your valuable suggestion. After careful consideration, we have decided to use "improved" in the title. Firstly, our method presents an efficient solution to a persistent challenge in bacterial single-cell RNA sequencing, specifically addressing rRNA abundance. Secondly, it facilitates precise exploration of bacterial population heterogeneity. We believe our method encompasses more than just cost-effectiveness, justifying the use of the term "advanced."

      Consider expanding the introduction. The introduction does not explain the setup of the biological question or basic details such as the organism(s) for which the technique has been developed, or which species biofilms were studied. 

      Thank you for your valuable feedback regarding our introduction. We acknowledge our compressed writing style due to constrains of the requirements of the Short Report format in eLife. We appreciate opportunity to expand this crucial section of our manuscript, which will undoubtedly improve the clarity and impact of our manuscript's introduction.

      We revised our introduction (lines 53-80) according to following principles:

      (1) Initial Biological Question: We explained the initial biological question that motivated our research—understanding the heterogeneity in E. coli biofilms—to provide essential context for our technological development.

      (2) Limitations of Existing Techniques: We briefly described the limitations of current single-cell sequencing techniques for bacteria, particularly regarding their application in biofilm studies.

      (3) Introduction of Improved Technique: We introduced our improved technique, initially developed for E. coli.

      (4) Research Evolution: We highlighted how our research has evolved, demonstrating that our technique is applicable not only to E. coli but also to Gram-positive bacteria and other Gram-negative species, showcasing the broad applicability of our method.

      (5) Specific Organisms Studied: We provided examples of the specific organisms we studied, encompassing both Gram-positive and Gram-negative bacteria.

      (6) Potential Implications: Finally, we outlined the potential implications of our technique for studying bacterial heterogeneity across various species and contexts, extending beyond biofilms.

      (2) Writing remarks: 

      43-45 Reword: "Thus, we address a persistent challenge in bacterial single-cell RNA-seq regarding rRNA abundance, exemplifying the utility of this method in exploring biofilm heterogeneity.". 

      Thank you for highlighting this sentence and requesting a rewording. I appreciate the opportunity to improve the clarity and impact of our statement. We have reworded the sentence as: "Our method effectively tackles a long-standing issue in bacterial single-cell RNA-seq: the overwhelming abundance of rRNA. This advancement significantly enhances our ability to investigate the intricate heterogeneity within biofilms at unprecedented resolution." (lines 47-50)

      49 "Biofilms, comprising approximately 80% of chronic and recurrent microbial infections in the human body..." - probably meant 'contribute to'. 

      Thank you for catching this imprecision in our statement. We have reworded the sentence as: "​Biofilms contribute to approximately 80% of chronic and recurrent microbial infections in the human body...​"

      54-55 Please expand on "this". 

      Thank you for your request to expand on the use of "this" in the sentence. You're right that more clarity would be beneficial here. We have revised and expanded this section in lines 54-69.

      81-84 Unclear why these species samples were either at exponential or stationary phases. The growth stage can influence the proportion of rRNA and other transcripts in the population. 

      Thank you for raising this important point about the growth phases of the bacterial samples used in our study. We appreciate the opportunity to clarify our experimental design. To evaluate the performance of RiboD-PETRI, we designed a comprehensive assessment of rRNA depletion efficiency under diverse physiological conditions, specifically contrasting exponential and stationary phases. This approach allows us to understand how these different growth states impact rRNA depletion efficacy. Additionally, we included a variety of bacterial species, encompassing both gram-negative and gram-positive organisms, to ensure that our findings are broadly applicable across different types of bacteria. By incorporating these variables, we aim to provide insights into the robustness and reliability of the RiboD-PETRI method in various biological contexts. We have included this rationale in our result section (lines 99-106), providing readers with a clear understanding of our experimental design choices.

      86 "compared TO PETRI-seq " (typo). 

      We have corrected this typo in our manuscript.

      94 "gene expression collectively" rephrase. Probably this means coverage of the entire gene set across all cells. Same for downstream usage of the phrase. 

      Thank you for pointing out this ambiguity in our phrasing. Your interpretation of our intended meaning is accurate. We have rephrased the sentence as “transcriptome-wide gene coverage across the cell population”.

      97 What were the median UMIs for the 30,000 cell library {greater than or equal to}15 UMIs? Same question for the other datasets. This would reflect a more comparable statistic with previous studies than the top 3% of the cells for example, since the distributions of the single-cell UMIs typically have a long tail. 

      Thank you for this insightful question and for pointing out the importance of providing more comparable statistics. We agree that median values offer a more robust measure of central tendency, especially for datasets with long-tailed distributions, which are common in single-cell studies. The suggestion to include median Unique Molecular Identifier (UMI) counts would indeed provide a more comparable statistic with previous studies. We have analyzed the median UMIs for our libraries as follows and revised our manuscript according to the analysis (lines 126-130, 133-136, 139-142 and 175-180).

      (1) Median UMI count in Exponential Phase E. coli:

      Total: 102 UMIs per cell

      Top 1,000 cells: 462 UMIs per cell

      Top 5,000 cells: 259 UMIs per cell

      Top 10,000 cells: 193 UMIs per cell

      (2) Median UMI count in Stationary Phase S. aureus:

      Total: 142 UMIs per cell

      Top 1,000 cells: 378 UMIs per cell

      Top 5,000 cells: 207 UMIs per cell

      Top 8,000 cells: 167 UMIs per cell

      (3) Median UMI count in Exponential Phase C. crescentus:

      Total: 182 UMIs per cell

      Top 1,000 cells: 2,190 UMIs per cell

      Top 5,000 cells: 662 UMIs per cell

      Top 10,000 cells: 225 UMIs per cell

      (4) Median UMI count in Static E. coli Biofilm:

      Total of Replicate 1: 34 UMIs per cell

      Total of Replicate 2: 52 UMIs per cell

      Top 1,621 cells of Replicate 1: 283 UMIs per cell

      Top 3,999 cells of Replicate 2: 239 UMIs per cell

      104-105 The performance metric should again be the median UMIs of the majority of the cells passing the filter (15 mRNA UMIs is reasonable). The top 3-5% are always much higher in resolution because of the heavy tail of the single-cell UMI distribution. It is unclear if the performance surpasses the other methods using the comparable metric. Recommend removing this line. 

      We appreciate your suggestion regarding the use of median UMIs as a more appropriate performance metric, and we agree that comparing the top 3-5% of cells can be misleading due to the heavy tail of the single-cell UMI distribution. We have removed the line in question (104-105) that compares our method's performance based on the top 3-5% of cells in the revised manuscript. Instead, we focused on presenting the median UMI counts for cells passing the filter (≥15 mRNA UMIs) as the primary performance metric. This will provide a more representative and comparable measure of our method's performance. We have also revised the surrounding text to reflect this change, ensuring that our claims about performance are based on these more robust statistics (lines 126-130, 133-136, 139-142 and 175-180).

      106-108 The sequencing saturation of the libraries (in %), and downsampling analysis should be added to illustrate this point. 

      Thank you for your valuable suggestion. Your recommendation to add sequencing saturation and downsampling analysis is highly valuable and will help better illustrate our point. Based on your feedback, we have revised our manuscript by adding the following content:

      To provide a thorough evaluation of our sequencing depth and library quality, we performed sequencing saturation analysis on our sequencing samples. The findings reveal that our sequencing saturation is 100% (Fig. 8A & B), indicating that our sequencing depth is sufficient to capture the diversity of most transcripts. To further illustrate the impact of our downstream analysis on the datasets, we have demonstrated the data distribution before and after applying our filtering criteria (Fig. S1B & C). These figures effectively visualized the influence of our filtering process on the data quality and distribution. After filtering, we can have a more refined dataset with reduced noise and outliers, which enhances the reliability of our downstream analyses.

      We have also ensured that a detailed description of the sequencing saturation method is included in the manuscript to provide readers with a comprehensive understanding of our methodology. We appreciate your feedback and believe these additions significantly improve our work.

      122: Please provide more details about the biofilm setup, including the media used. I did not find them in the methods. 

      We appreciate your attention to detail, and we agree that this information is crucial for the reproducibility of our experiments. We propose to add the following information to our methods section (lines 311-318):

      "For the biofilm setup, bacterial cultures were grown overnight. The next day, we diluted the culture 1:100 in a petri dish. We added 2ml of LB medium to the dish. If the bacteria contain a plasmid, the appropriate antibiotic needs to be added to LB. The petri dish was then incubated statically in a growth chamber for 24 hours. After incubation, we performed imaging directly under the microscope. The petri dishes used were glass-bottom dishes from Biosharp (catalog number BS-20-GJM), allowing for direct microscopic imaging without the need for cover slips or slides. This setup allowed us to grow and image the biofilms in situ, providing a more accurate representation of their natural structure and composition.​"

      125: "sequenced 1,563 reads" missing "with" 

      Thank you for correcting our grammar. We have revisd the phrase as “sequenced with 1,563 reads”.

      126: "283/239 UMIs per cell" unclear. 283 and 239 UMIs per cell per replicate, respectively? 

      Thank you for correcting our grammar. We have revised the phrase as “283 and 239 UMIs per cell per replicate, respectively” (lines 184).

      Figure 1D: Please indicate where the comparison datasets are from. 

      We appreciate your question regarding the source of the comparison datasets in Figure 1D. All data presented in Figure 1D are from our own sequencing experiments. We did not use data from other publications for this comparison. Specifically, we performed sequencing on E. coli cells in the exponential growth phase using three different library preparation methods: RiboD-PETRI, PETRI-seq, and RNA-seq. The data shown in Figure 1D represent a comparison of UMIs and/or reads correlations obtained from these three methods. All sequencing results have been uploaded to the Gene Expression Omnibus (GEO) database. The accession number is GSE260458. We have updated the figure legend for Figure 1D to clearly state that all datasets are from our own experiments, specifying the different methods used.

      Figure 1I, 2D: Unable to interpret the color block in the data. 

      We apologize for any confusion regarding the interpretation of the color blocks in Figures 1I and 2D (which are Figure 2E, 3E now). The color blocks in these figures represent the p-values of the data points. The color scale ranges from red to blue. Red colors indicate smaller p-values, suggesting higher statistical significance and more reliable results. Blue colors indicate larger p-values, suggesting lower statistical significance and less reliable results. We have updated the figure legends for both Figure 2E and Figure 3E to include this explanation of the color scale. Additionally, we have added a color legend to each figure to make the interpretation more intuitive for readers.

      Figure1H and 2C: Gene names should be provided where possible. The locus tags are highly annotation-dependent and hard to interpret. Also, a larger size figure should be helpful. The clusters 2 and 3 in 2C are the most important, yet because they have few cells, very hard to see in this panel. 

      We appreciate your suggestions for improving the clarity and interpretability of Figures 1H and 2C (which is Figure 2D, 3D now). We have replaced the locus tags with gene names where possible in both figures. We have increased the size of both figures to improve visibility and readability. We have also made Clusters 2 and 3 in Figure 3D more prominent in the revised figure. Despite their smaller cell count, we recognize their importance and have adjusted the visualization to ensure they are clearly visible. We believe these modifications will significantly enhance the clarity and informativeness of Figures 2D and 3D.​

      (3) Questions to consider further expanding on, by more analyses or experiments and in the discussion: 

      What are the explanations for the apparently contradictory upregulation of c-di-GMP in cells expressing higher PdeI levels? How could a phosphodiesterase lead to increased c-di-GMP levels? 

      We appreciate the reviewer's observation regarding the seemingly contradictory relationship between increased PdeI expression and elevated c-di-GMP levels. This is indeed an intriguing finding that warrants further explanation.

      PdeI was predicted to be a phosphodiesterase responsible for c-di-GMP degradation. This prediction is based on sequence analysis where PdeI contains an intact EAL domain known for degrading c-di-GMP. However, it is noteworthy that PdeI also contains a divergent GGDEF domain, which is typically associated with c-di-GMP synthesis (Fig S8). This dual-domain architecture suggests that PdeI may engage in complex regulatory roles. Previous studies have shown that the knockout of the major phosphodiesterase PdeH in E. coli leads to the accumulation of c-di-GMP. Further, a point mutation on PdeI's divergent GGDEF domain (G412S) in this PdeH knockout strain resulted in decreased c-di-GMP levels2, implying that the wild-type GGDEF domain in PdeI contributes to the maintenance or increase of c-di-GMP levels in the cell. Importantly, our single-cell experiments showed a positive correlation between PdeI expression levels and c-di-GMP levels (Response Fig. 9B). In this revision, we also constructed PdeI(G412S)-BFP mutation strain. Notably, our observations of this strain revealed that c-di-GMP levels remained constant despite increasing BFP fluorescence, which serves as a proxy for PdeI(G412S) expression levels (Fig. 4D). This experimental evidence, along with domain analysis, suggests that PdeI could contribute to c-di-GMP synthesis, rebutting the notion that it solely functions as a phosphodiesterase. HPLC LC-MS/MS analysis further confirmed that PdeI overexpression, induced by arabinose, led to an upregulation of c-di-GMP levels (Fig. 4E). These results strongly suggest that PdeI plays a significant role in upregulating c-di-GMP levels. Our further analysis revealed that PdeI contains a CHASE (cyclases/histidine kinase-associated sensory) domain. Combined with our experimental results demonstrating that PdeI is a membrane-associated protein, we hypothesize that PdeI functions as a sensor that integrates environmental signals with c-di-GMP production under complex regulatory mechanisms.

      We have also included this explanation (lines 193-217) and the supporting experimental data (Fig. 4D & 4J) in our manuscript to clarify this important point. Thank you for highlighting this apparent contradiction, as it has allowed us to provide a more comprehensive explanation of our findings.

      What about the rest of the genes in cluster 2 of the biofilm? They should be used to help interpret the association between PdeI and c-di-GMP. 

      We understand your interest in the other genes present in cluster 2 of the biofilm and their potential relationship to PdeI and c-di-GMP. After careful analysis, we have determined that the other marker genes in this cluster do not have a significant impact on biofilm formation. Furthermore, we have not found any direct relationship between these genes and c-di-GMP or PdeI. Our focus on PdeI in this cluster is due to its unique and significant role in c-di-GMP regulation and biofilm formation, as demonstrated by our experimental results. While the other genes in this cluster may be co-expressed, their functions appear to be unrelated to the PdeI and c-di-GMP pathway we are investigating. We chose not to elaborate on these genes in our main discussion as they do not contribute directly to our understanding of the PdeI and c-di-GMP association. Instead, we could include a brief mention of these genes in the manuscript, noting that they were found to be unrelated to the PdeI-c-di-GMP pathway. This would provide a more comprehensive view of the cluster composition while maintaining focus on the key findings related to PdeI and c-di-GMP.

      Author response image 2.

      Protein-protein interactions of marker genes in cluster 2 of 24-hour static biofilms of E coli data.

      A verification is needed that the protein fusion to PdeI functional/membrane localization is not due to protein interactions with fluorescent protein fusion. 

      We appreciate your concern regarding the potential impact of the fluorescent protein fusion on the functionality and membrane localization of PdeI. It is crucial to verify that the observed effects are attributable to PdeI itself and not an artifact of its fusion with the fluorescent protein. To address this matter, we have incorporated a control group expressing only the fluorescent protein BFP (without the PdeI fusion) under the same promoter. This experimental design allows us to differentiate between effects caused by PdeI and those potentially arising from the fluorescent protein alone.

      Our results revealed the following key observations:

      (1) Cellular Localization: The GFP alone exhibited a uniform distribution in the cytoplasm of bacterial cells, whereas the PdeI-GFP fusion protein was specifically localized to the membrane (Fig. 4C).

      (2) Localization in the Biofilm Matrix: BFP-positive cells were distributed throughout the entire biofilm community. In contrast, PdeI-BFP positive cells localized at the bottom of the biofilm, where cell-surface adhesion occurs (Fig 4F).

      (3) c-di-GMP Levels: Cells with high levels of BFP displayed no increase in c-di-GMP levels. Conversely, cells with high levels of PdeI-BFP exhibited a significant increase in c-di-GMP levels (Fig. 4D).

      (4) Persister Cell Ratio: Cells expressing high levels of BFP showed no increase in persister ratios, while cells with elevated levels of PdeI-BFP demonstrated a marked increase in persister ratios (Fig. 4J).

      These findings from the control experiments have been included in our manuscript (lines 193-244, Fig. 4C, 4D, 4F, 4G and 4J), providing robust validation of our results concerning the PdeI fusion protein. They confirm that the observed effects are indeed due to PdeI and not merely artifacts of the fluorescent protein fusion.

      (!) Vrabioiu, A. M. & Berg, H. C. Signaling events that occur when cells of Escherichia coli encounter a glass surface. Proceedings of the National Academy of Sciences of the United States of America 119, doi:10.1073/pnas.2116830119 (2022). https://doi.org/10.1073/pnas.2116830119

      (2)bReinders, A. et al. Expression and Genetic Activation of Cyclic Di-GMP-Specific Phosphodiesterases in Escherichia coli. J Bacteriol 198, 448-462 (2016). https://doi.org:10.1128/JB.00604-15

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The main goal of the paper was to identify signals that activate FLP-1 release from AIY neurons in response to H2O2, previously shown by the authors to be an important oxidative stress response in the worm. 

      Strengths: 

      This study builds upon the authors' previous work (Jia and Sieburth 2021) by further elucidating the gut-derived signaling mechanisms that coordinate the organism-wide antioxidant stress response in C. elegans. 

      By detailing how environmental cues like oxidative stress are transduced into gut-derived peptidergic signals, this study represents a valuable advancement in understanding the integrated physiological responses governed by the gut-brain axis. 

      This work provides valuable mechanistic insights into the gut-specific regulation of the FLP2 peptide signal. 

      Weaknesses: 

      Although the authors identify intestinal FLP-2 as the endocrine signal important for regulating the secretion of the neuronal antioxidant neuropeptide, FLP-1, there is no effort made to identify how FLP-2 levels regulate FLP-1 secretion or identify whether this regulation is occurring directly through the AIY neuron or indirectly. This is brought up in the discussion, but identifying a target for FLP-2 in this pathway seems like a crucial missing piece of information in characterizing this pathway. 

      We agree that this is an important question. Specifically, identifying the FLP-2 receptor and its site of action is a major priority. Since there are at least four different receptors that have been functionally or physically linked to FLP-2 and there are at least three FLP-2 peptides, unraveling the components acting directly downstream of FLP-2 will require further investigation that we feel is beyond the scope of this current study. We have added a new panel (Fig 1E) addressing the requirements for flp-2 signaling on peroxide production in AIY. These results provide new mechanistic insight into how flp-2 impacts signaling in AIY and a new interpretation of these results has been added to the discussion.

      Reviewer #2 (Public Review): 

      Summary: 

      The core findings demonstrate that the neuropeptide-like protein FLP-2, released from the intestine of C. elegans, is essential for activating the intestinal oxidative stress response. This process is mediated by endogenous hydrogen peroxide (H2O2), which is produced in the mitochondrial matrix by superoxide dismutases SOD-1 and SOD-3. H2O2 facilitates FLP-2 secretion through the activation of protein kinase C family member pkc-2 and the SNAP25 family member aex-4. The study further elucidates that FLP-2 signaling potentiates the release of the antioxidant FLP-1 neuropeptide from neurons, highlighting a bidirectional signaling mechanism between the intestine and the nervous system. 

      Strengths: 

      This study presents a significant contribution to the understanding of the gut-brain axis and its role in oxidative stress response and significantly advances our understanding of the intricate mechanisms underlying the gut-brain axis's role in oxidative stress response. By elucidating the role of FLP-2 and its regulation by H2O2, the study provides insights into the molecular basis of inter-tissue communication and antioxidant defense in C. elegans. These findings could have broader implications for understanding similar pathways in more complex organisms, potentially offering new targets for therapeutic intervention in diseases related to oxidative stress and aging. 

      Weaknesses: 

      (1) The experimental techniques employed in the study were somewhat simple and could benefit from the incorporation of more advanced methodologies. 

      Thank you for your comment

      (2) The weak identification of the key receptors mediating the interaction between FLP-2 and AIY neurons, as well as the receptors in the gut that respond to FLP-1. 

      We agree that this is an important question. Specifically, identifying the FLP-2 receptor and its site of action is a major priority. Since there are at least four different receptors that have been functionally or physically linked to FLP-2 and there are at least three FLP-2 peptides, unraveling the components acting directly downstream of FLP-2 will require further investigation that we feel is beyond the scope of this current study.

      (3) The study could be improved by incorporating a sensor for the direct measurement of hydrogen peroxide levels. 

      We have added a new panel (Fig 1E) addressing the requirements for flp-2 signaling on peroxide production in AIY using the genetically encoded peroxide sensor HyPer7. These results provide new mechanistic insight into how flp-2 impacts signaling in AIY and a new interpretation of these results has been added to the discussion. In addition, we have used HyPer7 to measure peroxide levels in the intestinal mitochondrial matrix and outer membrane (Figs 3, 4, 5, 6)

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The major missing link in the study is how FLP-2 affects FLP-1 release from AIY: is the effect direct and does it require the previously described FLP-2 receptor FRPR-18? Although this possibility is discussed extensively (L511-528) so it is odd that the effect of an frpr-18 mutation was not tested (or if it was tested, why the results were not reported). If the authors haven't done this experiment (despite doing many less critical experiments) it would be good to know why. 

      We agree that this is an important question. Specifically, identifying the FLP-2 receptor and its site of action is a major priority. Since there are at least four different receptors that have been functionally or physically linked to FLP-2 and there are at least three FLP-2 peptides, unraveling the components acting directly downstream of FLP-2 will require further investigation that we feel is beyond the scope of this current study. We have added a new panel (Fig 1E) addressing the requirements for flp-2 signaling on peroxide production in AIY. These results provide new mechanistic insight into how flp-2 impacts signaling in AIY and a new interpretation of these results has been added to the discussion.

      Results:

      “To address how flp-2 signaling regulates FLP-1 secretion from AIY, we examined H2O2 levels in AIY using a mitochondrially targeted pH-stable H2O2 sensor HyPer7 (mitoHyPer7, Pak et al. 2020). Mito-HyPer7 adopted a punctate pattern of fluorescence in AIY axons, and the average fluorescence intensity of axonal mito-HyPer7 puncta increased about two-fold following 10 minute juglone treatment (Fig 1E), in agreement with our previous studies using HyPer (Jia and Sieburth 2021), confirming that juglone rapidly increases mitochondrial AIY H2O2 levels. flp-2 mutations had no significant effects on the localization or the average intensity of mito-HyPer7 puncta in AIY axons either in the absence of juglone, or in the presence of juglone (Fig 1E), suggesting that flp-2 signaling promotes FLP-1 secretion by a mechanism that does not increase H2O2 levels in AIY. Consistent with this, intestinal overexpression of flp-_2 had no effect on FLP-1::Venus secretion in the absence of juglone, but significantly enhanced the ability of juglone to increase FLP-1 secretion (Fig. 1D). We conclude that both elevated mitochondrial H2O2 levels and intact _flp-2 signaling from the intestine are necessary to increase FLP-1 secretion from AIY.”

      More minor comments/suggestions: 

      Line 172: No justification is given as to why the authors chose to focus on flp-2 over the other potential candidates identified in their RNAi screen. 

      We are currently examining the other neuropeptide hits from the screen, but we have no additional phenotypes to report.

      Line 189: An explanation for the use of gDNA as opposed to cDNA should be given. 

      We have changed the text in the Results section as follows:

      “Expressing a flp-2 genomic DNA (gDNA), fragment (containing both the flp-2a and flp-2b isoforms that arise by alternative splicing), specifically in the nervous system failed to rescue the FLP-1::Venus defects of flp-2 mutants, whereas expressing flp-2 selectively in the intestine fully restored juglone-induced FLP1::Venus secretion to flp-2 mutants (Fig. 1D).”

      Line 249-253: nlp-40 and nlp-27 were not implicated in contributing to juglone toxicity in the RNAi screen performed previously by the authors, so it is unclear why both of these peptides are investigated beyond simply being released from the intestine. Confusingly, while Figure S2D shows no overlap between NLP-40 and FLP2, NLP-27 is omitted from the analysis. 

      We have clarified that these peptides are not implicated in stress responses, providing a clearer rational for why the serve as controls for specificity.

      “Third, nlp-40 and nlp-27 encode neuropeptide-like proteins that are released from the intestine, but are not implicated in stress responses (Liu et al. 2023; Taylor et al. 2021; Wang et al. 2013), and juglone treatment had no detectable effects on coelomocyte fluorescence in animals expressing intestinal NLP-40::Venus or NLP-27::Venus fusion proteins (Fig. S2B and C), and NLP40::mTur2 puncta did not overlap with FLP-2::Venus puncta in the intestine (Fig. S2D).”

      Line 262: A more detailed description of juglone's mechanism of action would be welcome here. Is juglone expected to act only in intestinal cells, or is its function more pervasive? 

      We have added more detail:

      “Juglone generates superoxide anion radicals (Ahmad and Suzuki 2019; Paulsen and Ljungman 2005) and juglone treatment of C. elegans increases ROS levels (de Castro, Hegi de Castro, and Johnson 2004) likely by promoting the global production of mitochondrial superoxide. Superoxide can then be rapidly converted into H2O2 by superoxide dismutase.”

      Line 414: Justification for why expulsion frequency is used here to quantify NLP-40 secretion is required, particularly because NLP-40::Venus was already used to quantify NLP-40 secretion via the coelomocyte fluorescence method in the experiments contributing to Figure S2. 

      We used expulsion frequency here because (1) it is an easier assay compared to the coelomocyte assay and (2) it is a functional assay. Defective NLP-40 exocytosis manifests as reduced exclusion frequency, therefore if NLP-40 secretion is defective in pkc-2 mutants, nlp-40 mutants should exhibit defects in expulsion frequency.

      We have clarified this point:

      “To determine whether pkc-2 can regulate the intestinal secretion of other peptides that are not associated with oxidative stress, we examined expulsion frequency, which is a measure of NLP-40 secretion (Mahoney et al. 2008; Wang et al. 2013).”

      Line 478: The discussion of neuronally-secreted kisspeptin in this context does not seem relevant as this paper has focused on intestinal peptide secretion. 

      We have removed this sentence:

      In mammals, release of the RF-amide neuropeptide kisspeptin from the anteroventral periventricular nucleus (AVPV) regulates reproduction by inducing the release of gonadotropins via its stimulatory action on GnRH neurons (Han et al. 2005).

      Line 526: DMSR-18 seems to be a typo. Possibly meant FRPR-8, as this is another FLP-2-activated GPCR identified in the screen (though notably, FRPR-8 is only activated by one of the two FLP-2 peptide products) On that note, DMSR-1 has two isoforms, and only one of them is activated by FLP-2 (and only one of the two FLP-2 peptides). This seems relevant to discuss. 

      We have corrected the text and we have added to the discussion the number of FLP-2 peptides:

      “In addition, certain FLP-2-derived peptides (of which there are at least three) can bind to the GPCRs DMSR-1, or FRPR-8 in transfected cells (Beets et al. 2023). Identifying the relevant FLP-2 peptide(s), the FLP-2 receptor and its site of action will help to define the circuit used by intestinal flp-2 to promote FLP-1 release from AIY.” 

      Line 534: An explanation or speculation into why this integration might be necessary would be welcome here. 

      We have edited this paragraph:

      “FLP-1 release from AIY is positively regulated by H2O2 generated from mitochondria (Jia and Sieburth 2021). Here we showed that H2O2-induced FLP-1 release requires intestinal flp-2 signaling. However, flp-2 does not appear to promote FLP-1 secretion by increasing H2O2 levels in AIY (Fig 1E), and flp-2 signaling is not sufficient to promote FLP-1 secretion in the absence of H2O2 (Fig. 1D). These results point to a model whereby at least two conditions must be met in order for AIY to increase FLP-1 secretion: an increase in H2O2 levels in AIY itself, and an increase in flp-2 signaling from the intestine. Thus AIY integrates stress signals from both the nervous system and the intestine to activate the intestinal antioxidant response through FLP-1 secretion. The requirement of signals from multiple tissues for FLP-1 secretion may function to limit the activation of SKN-1, since unregulated SKN-1 activation can be detrimental to organismal health (Turner, Ramos, and Curran 2024).”

      Line 569: Should specify what these candidates are. 

      There are 11 proteins with thioredoxin fold domains. We modified the sentence to list one of them.

      “There are several thioredoxin-domain containing proteins in addition to trx-3 in the C. elegans genome that could be candidates for this role (e.g. trx-5 and others).”

      Line 660: Details about whether the M9 control had an equivalent amount of DMSO as the juglone+M9 condition is required. 

      We have performed toxicity assay and neuropeptide release assays comparing M9 DMSO, and Juglone treatment and we have included this new data in Fig S1C, D and S2E. Methods: 

      “A stock solution of 50mM juglone in DMSO was freshly made on the same day of liquid toxicity assay. 120μM  working solution of juglone in M9 buffer was prepared using stock solution before treatment. Around 60-80 synchronized adult animals were transferred into a 1.5mL Eppendorf tube with fresh M9 buffer and washed three times, and a final wash was done with either the working solution of juglone with or M9  DMSO at the concentrations present in juglone-treated animals does not contribute to toxicity since DMSO treatment alone caused no significant change in survival compared to M9-treated controls (Fig. S1C).

      For coelomocyte imaging, L4 stage animals were transferred in fresh M9 buffer on a cover slide, washed six times with M9 before being exposed to 300μM juglone in M9 buffer (diluted from freshly made 50mM stock solution), 1mM H2O2 in M9 buffer, or M9 buffer. DMSO at the concentrations present in juglone-treated animals does not alter neuropeptide secretion since DMSO treatment alone caused no significant change in FLP-1::Venus or FLP-2::Venus coelomocyte fluorescence compared to M9-treated controls.  (Fig. S1D and S2E).”

      Line 1191: Should be FLP-1:Venus in AIY, not the intestine  

      Corrected.

      In general, the significance of reporting in the figures is very unclear. "a, b, c" to report statistical analysis is confusing in the figure legends, and also unnecessary when they denote non-significance. There are some cases where it is reported that a symbol (eg. ***) denotes statistical significance, but there is no indication of what level of statistical significance the symbol represents (for example, in Figures 2C and 2D) 

      Levels of significance was summarized in the end of legend for each figure unless indicated for specific symbols (for example Fig. 1C), we have edited this figure legend: 

      “E Representative images and quantification of fluorescence of matrix-targeted HyPer7 in the axon of AIY following M9 or juglone treatment for 10min. Arrowheads denote puncta marked by MLS::HyPer7 fusion proteins (Excitation: 500 and 400nm; emission: 520nm). Ratio of images taken with 500nM (GFP) and 400nM (CFP) for excitation was used to measure H2O2 levels. Unlined *** and ns denote statistical analysis compared to “wild type”. n = 25, 25, 25, 25 independent animals. Scale bar: 10μM.

      F Representative images and quantification of average fluorescence in the posterior region of transgenic animals expressing P_gst-4::gfp_ after 4h vehicle M9 or juglone exposure. Asterisks mark the intestinal region used for quantification. P_gst-4::gfp_ expression in the body wall muscles, which appears as fluorescence on the edge animals in some images, was not quantified. Unlined *** and ns denote statistical analysis compared to “wild type”; unlined ## and ### denotes statistical analysis compared to “wild type+juglone”. n = 25, 26, 25, 25, 25, 25, 25, 25 independent animals. Scale bar: 10μM.”

      Figure 2C: It is unclear which conditions have H2O2 treatment (as described in the legend). There is also no mention of what ### indicates. 

      Levels of significance for ### was summarized in the end of legend, No H2O2 treatment was performed in this assay, we have edited this figure legend: 

      “C. Representative images and quantification of average coelomocyte fluorescence of the indicated mutants expressing FLP-2::Venus fusion proteins in the intestine following M9 or juglone treatment for 10min. Unlined *** and ns denote statistical analysis compared to “wild type”. n = 29, 25, 24, 30, 23, 30, 25, 25, 25 independent animals. Scale bar: 5μM.”

      Figure 2D: It is not previously mentioned that M9 condition contains DMSO, as implied by the legend. 

      We have edited this figure legend:

      “D. Quantification of average coelomocyte fluorescence of transgenic animals expressing FLP-2::Venus fusion proteins in the intestine following treatment of fresh M9 buffer or the indicated stressors for 10min. Unlined *** denotes statistical analysis compared to “M9”. n = 23, 25, 25 independent animals.”  

      Figure 3J: The y-axis label should more clearly describe the ratio being measured. 

      We have updated the panel and this figure legend: 

      “J. Schematic, representative images and quantification of fluorescence in the posterior region of the indicated transgenic animals co-expressing mitochondrial matrix targeted HyPer7 (matrix-HyPer7) or mitochondrial outer membrane targeted HyPer7 (OMMHyPer7) with TOMM-20::mCherry following M9 juglone or H2O2 treatment. Ratio of images taken with 500nM (GFP) and 400nM (CFP) for excitation and 520nm for emission was used to measure H2O2 levels. Unlined *** and ns denote statistical analysis compared to “wild type; unlined ## denotes statistical analysis compared to “wild type+juglone”. (top) n = 20, 20, 18, 20, 19, 19, 20, 20 independent animals.

      (bottom) n = 20, 20, 19, 20, 20, 20, 20, 20 independent animals. Scale bar: 5μM.” 

      Figure S3A: *** is mislabelled. It should be a comparison to wildtype. 

      We have edited this figure legend: 

      “A. Quantification of average coelomocyte fluorescence of the indicated mutants expressing FLP-2::Venus fusion proteins in the intestine following M9 or juglone treatment for 10min. Unlined *** denotes statistical analysis compared to “wild type”; ### and ns denote statistical analysis compared to “wild type+juglone”. n = 29, 27, 29, 27, 25, 26, 24 independent animals.”  

      Reviewer #2 (Recommendations For The Authors): 

      (1) The localization experiments could benefit from the application of ultra-high-resolution fluorescence microscopy. This would allow for a more detailed analysis of the spatial distribution of SOD-1/3::GFP in relation to mitochondria-targeted TOMM-20::mCherry fusion proteins in the posterior intestinal region of transgenic animals. 

      We agree that high resolution microscopy would be a great way to more precisely localize SOD proteins relative to the mitochondria, and this would enhance understanding of the source of peroxide in this system. We do not conduct this type of microcopy in the lab, so this approach would require a collaboration with a lab that is set up for this. Thus we feel that this is beyond the scope of the current study.  

      (2) The paper may note the challenge of directly measuring mitochondrial H2O2 concentrations. However, advancements in chemical or fluorescent sensors for H2O2 detection within mitochondria could provide more direct evidence of its role in FLP-2 secretion. 

      We have considered using chemical sensors, but many are either not efficiently taken up by worms (the skin is largely impermeable to all but the most hydrophobic molecules), or they would label peroxide indiscriminately in all tissues making detection specifically in the intestine challenging. We have had good luck with genetically encoded peroxide sensors since they provide tissue specificity and good spatial resolution depending on where we target them. We have added imaging results for HyPer7 in the AIY neuron to Figure 1E. 

      Results:

      “To address how flp-2 signaling regulates FLP-1 secretion from AIY, we examined H2O2 levels in AIY using a mitochondrially targeted pH-stable H2O2 sensor HyPer7 (mitoHyPer7, Pak et al. 2020). Mito-HyPer7 adopted a punctate pattern of fluorescence in AIY axons, and the average fluorescence intensity of axonal mito-HyPer7 puncta increased about two-fold following 10 minute juglone treatment (Fig 1E), in agreement with our previous studies using HyPer (Jia and Sieburth 2021), confirming that juglone rapidly increases mitochondrial AIY H2O2 levels. flp-2 mutations had no significant effects on the localization or the average intensity of mito-HyPer7 puncta in AIY axons either in the absence of juglone, or in the presence of juglone (Fig 1E), suggesting that flp-2 signaling promotes FLP-1 secretion by a mechanism that does not increase H2O2 levels in AIY. Consistent with this, intestinal overexpression of flp-_2 had no effect on FLP-1::Venus secretion in the absence of juglone, but significantly enhanced the ability of juglone to increase FLP-1 secretion (Fig. 1D). We conclude that both elevated mitochondrial H2O2 levels and intact _flp-2 signaling from the intestine are necessary to increase FLP-1 secretion from AIY.” 

      (3) To confirm the activation of AIY neurons by FLP-2, measuring calcium activity in these neurons may be a robust approach. It would be beneficial to determine if synthetic FLP-2 can activate AIY neurons and subsequently induce an intestinal antioxidant response. 

      This is a great idea. We have begun to examine GCaMP fluorescence in AIY and we see responses to oxidative stressors. We think that this data is too preliminary at the moment to include here.  

      (4) The identification of the key receptors mediating the interaction between FLP-2 and AIY neurons, as well as the receptors in the gut that respond to FLP-1, would complete the signaling pathway and strengthen the study's conclusions. 

      We agree that this is an important question. Specifically, identifying the FLP-2 receptor and its site of action is a major priority. Since there are at least four different receptors that have been functionally or physically linked to FLP-2 and there are at least three FLP-2 peptides, unraveling the components acting directly downstream of FLP-2 will require further investigation that we feel is beyond the scope of this current study.  

      (5) Investigating whether direct manipulation of AIY neurons, through methods such as optogenetic activation or inhibition, can trigger the gut's antioxidant response would provide insight into the functional relevance of this neuronal activity. 

      Also an excellent idea. We previously published that Channelrhodopsin activation specifically in AIY indeed increases FLP-1 secretion, but we have not yet examined its effects on antioxidant responses in the intestine.  This may require a more sustained activation of AIY than Channelrhodopsin can provide.

      (6) For the analysis of intestinal Pges-1::GFP fluorescence, specifying the region of interest would enhance the precision of the data and the reproducibility of the results. 

      We analyze fluorescence intensity of a 16-pixel diameter circle in the posterior intestine (as indicated by the asterisks) and we have added this to the methods, we edited this paragraph:

      “or transcriptional reporter imaging, young adult animals with indicated genotype were transferred into a 1.5mL Eppendorf tube with M9 buffer, washed three times and incubated in M9 buffer or 60uM working solution of juglone for 1h in dark on rotating mixer before recovering on fresh NGM plates with OP50 for 3h in dark at 20°C. The posterior end of the intestine was imaged with the 60x objective and quantification for average fluorescence intensity of a 16-pixel diameter circle in the posterior intestine was calculated using Metamorph.”

      (7) Assessing the potential for pharmacological modulation of FLP-2 or H2O2 levels could provide valuable insights into therapeutic strategies aimed at enhancing the oxidative stress response. 

      Agreed.

      (8) For improved clarity, it is suggested that the schematic currently presented in Figure S1A be integrated into Figure 2C, as this would facilitate the reader's comprehension of the experimental design and findings. 

      Moved.

    2. eLife Assessment

      This study presents convincing evidence of the role of an intestine-released neuropeptide, FLP-2, in the oxidative stress response of C. elegans, as well as for the neural circuit pathway that regulates its release in response to sensing reactive oxygen species (i.e., H2O2). These valuable results advance the understanding of gut-brain signaling and the neural circuit basis of behavioral responses to stress.

    3. Reviewer #1 (Public Review):

      Summary:

      The main goal of the paper was to identify signals that activate FLP-1 release from AIY neurons in response to H2O2, previously shown by the authors to be an important oxidative stress response in the worm.

      Strengths:

      This study builds upon the authors' previous work (Jia and Sieburth 2021) by further elucidating the gut-derived signaling mechanisms that coordinate the organism-wide antioxidant stress response in C. elegans.

      By detailing how environmental cues like oxidative stress are transduced into gut-derived peptidergic signals, this study represents a valuable advancement in understanding the integrated physiological responses governed by the gut-brain axis.

      This work provides valuable mechanistic insights into the gut-specific regulation of the FLP-2 peptide signal.

      Weaknesses:

      Although the authors identify intestinal FLP-2 as the endocrine signal important for regulating the secretion of the neuronal antioxidant neuropeptide, FLP-1, there is no effort made to identify how FLP-2 levels regulate FLP-1 secretion or identify whether this regulation is occurring directly through the AIY neuron or indirectly. This is brought up in the discussion, but identifying a target for FLP-2 in this pathway seems like a crucial missing piece of information in characterizing this pathway.

      Comments on revised version:

      In general I think the revision is improved and addresses my comments. It is unfortunate though that the authors did not address my main question (did they test the frpr-18 mutant, and if not, why?). The fact that there are other potentially relevant receptors which bind to some FLP-2 peptides with low affinity is not really a justification not to test the known high-affinity receptor (i.e. FRPR-18).

    4. Reviewer #2 (Public Review):

      Summary:

      The core findings demonstrate that the neuropeptide-like protein FLP-2, released from the intestine of C. elegans, is essential for activating the intestinal oxidative stress response. This process is mediated by endogenous hydrogen peroxide (H2O2), which is produced in the mitochondrial matrix by superoxide dismutases SOD-1 and SOD-3. H2O2 facilitates FLP-2 secretion through the activation of protein kinase C family member pkc-2 and the SNAP25 family member aex-4. The study further elucidates that FLP-2 signaling potentiates the release of the antioxidant FLP-1 neuropeptide from neurons, highlighting a bidirectional signaling mechanism between the intestine and the nervous system.

      Strengths:

      This study presents a significant contribution to the understanding of the gut-brain axis and its role in oxidative stress response and significantly advances our understanding of the intricate mechanisms underlying the gut-brain axis's role in oxidative stress response. By elucidating the role of FLP-2 and its regulation by H2O2, the study provides insights into the molecular basis of inter-tissue communication and antioxidant defense in C. elegans. These findings could have broader implications for understanding similar pathways in more complex organisms, potentially offering new targets for therapeutic intervention in diseases related to oxidative stress and aging.

      Weaknesses:

      (1) The experimental techniques employed in the study were somewhat simple and could benefit from the incorporation of more advanced methodologies.

      (2) The weak identification of the key receptors mediating the interaction between FLP-2 and AIY neurons, as well as the receptors in the gut that respond to FLP-1.

      (3) The study could be improved by incorporating a sensor for the direct measurement of hydrogen peroxide levels.

      Comments on revised version:

      The authors answered my main questions. Although many of the experiments I suggested are in the beginning stages, it is clear that the authors noted that they are critical to understanding the mechanism of action of FLP-2, and hopefully they will continue to push forward and develop more approaches to further identify the receptor mechanism.

    1. eLife Assessment

      This important manuscript describes a creative approach using dual-component gRNAs to create a new class of molecular proximity sensors for genome editing. The authors demonstrate that this tool can be coupled with several different gene editing effectors, showing convincingly that the tool functions as intended. This study not only introduces a first-of-its kind approach, but through careful measurements also enables future further development of the technology.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Choi and co-authors presents "P3 editing", which leverages dual-component guide RNAs (gRNA) to induce protein-protein proximity. They explore three strategies for leveraging prime-editing gRNA (pegRNA) as a dimerization module to create a molecular proximity sensor that drives genome editing, splitting a pegRNA into two parts (sgRNA and petRNA), inserting self-splicing ribozymes within pegRNA, and dividing pegRNA at the crRNA junction. Among these, splitting at the crRNA junction proved the most promising, achieving significant editing efficiency. They further demonstrated the ability to control genome editing via protein-protein interactions and small molecule inducers by designing RNA-based systems that form active gRNA complexes. This approach was also adaptable to other genome editing methods like base editing and ADAR-based RNA editing.

      Strengths:

      The study demonstrates significant advancements in leveraging guide RNA (gRNA) as a dimerization module for genome editing, showcasing its high specificity and versatility. By investigating three distinct strategies-splitting pegRNA into sgRNA and petRNA, inserting self-splicing ribozymes within the pegRNA, and dividing the pegRNA at the repeat junction-the researchers present a comprehensive approach to achieving molecular proximity and reconstituting function. Among these methods, splitting the pegRNA at the repeat junction emerged as the most promising, achieving editing efficiencies up to 76% of the control, highlighting its potential for further development in CRISPR-Cas9 systems. Additionally, the study extends genome editing control by linking protein-protein interactions to RNA-mediated editing, using specific protein-RNA interaction pairs to regulate editing through engineered protein proximity. This innovative approach expands the toolkit for precision genome editing, demonstrating the feasibility of controlling genome editing with enhanced specificity and efficiency.

      Weaknesses:

      The initial experiments with splitting the pegRNA into sgRNA and petRNA showed low editing efficiency, less than 2%. Similarly, inserting self-splicing ribozymes within pegRNA was inefficient, achieving under 2% editing efficiency in all constructs tested, possibly hindered by the prime editing enzyme. The editing efficiency of the crRNA and petracrRNA split at the repeat junction varied, with the most promising configurations only reaching 76% of the control efficiency. The RNA-RNA duplex formation's inefficiency might be due to the lack of additional protein binding, leading to potential degradation outside the Cas9-gRNA complex. Extending the approach to control genome editing via protein-protein interactions introduced complexity, with a significant trade-off between efficiency and specificity, necessitating further optimization. The strategy combining RADARS and P3 editing to control genome editing with specific RNA expression events exhibited high background levels of non-specific editing, indicating the need for improved specificity and reduced leaky expression. Moreover, P3 editing efficiencies are exclusively quantified after transfecting DNA into HEK cells, a strategy that has resulted in past reproducibility concerns for other technologies. Overall, the various methods and combinations require further optimization to enhance efficiency and specificity, especially when integrating multiple synthetic modules.

      Comments on revisions:

      I think the authors have successfully addressed the initial concerns. Their adaption of the main text and discussion makes the limitations of P3 editing much clearer.

    3. Reviewer #2 (Public review):

      Choi et al. describes a new approach for enabling input-specific CRISPR-based genome editing in cultured cells. While CRISPR-Cas9 is a broadly applied system across all of biology, one limitation is the difficulty in inducing genome editing based on cellular events. A prior study, from the same group, developed ENGRAM - which relies on activity-dependent transcription of a prime editing guide RNA, which records a specific cellular event as a given edit in a target DNA "tape". However, this approach is limited to detection of induced transcription, and does not enable the detection of broader molecular events including protein-protein interactions or exposure to small molecules. As an alternative, this study envisioned engineering the reconstitution of a split prime editing guide RNA (pegRNA) in a protein-protein interaction (PPI)-dependent manner. This would enable location- and content-specific genome editing in a controlled setting.

      Strengths:

      The strengths of this paper include an interesting concept for engineering guide RNAs to enable activity-dependent genome editing in living cells in the future, based on discreet protein-protein interactions (either constitutively, spatially, or chemically induced). Important groundwork is laid down to engineer and improve these guide RNAs in the future (especially the work describing altering the linkers in Supplementary Figure 3 - which provides a path forward).

      Weaknesses:

      In its current state, the editing efficiency appears too low to be applied in physiological settings. Much of the latter work in the paper relies on a LambdaN-MCP direction fusion protein, rather than two interacting protein pairs. Further characterizations in the future, especially varying the transfection amounts/durations/etc of the various components of the system, would be beneficial to improve the system. It will also be important to demonstrate editing at additional sites; to characterize how long the PPI must be active to enable efficient prime editing; and how reversible the reconstitution of the split pegRNA is.

      In the revised version, the authors clearly describe the present limitations of the system in the discussion section, and also highlight specific actions and potential approaches for improving the efficiency of the system for application in biological systems. They also add further insight into why it is advantageous to design engineered guideRNAs, as opposed to engineered Cas9 enzymes, to improve the modularity of the system in the future.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Choi and co-authors presents "P3 editing", which leverages dual-component guide RNAs (gRNA) to induce protein-protein proximity. They explore three strategies for leveraging prime-editing gRNA (pegRNA) as a dimerization module to create a molecular proximity sensor that drives genome editing, splitting a pegRNA into two parts (sgRNA and petRNA), inserting self-splicing ribozymes within pegRNA, and dividing pegRNA at the crRNA junction. Among these, splitting at the crRNA junction proved the most promising, achieving significant editing efficiency. They further demonstrated the ability to control genome editing via protein-protein interactions and small molecule inducers by designing RNA-based systems that form active gRNA complexes. This approach was also adaptable to other genome editing methods like base editing and ADAR-based RNA editing.

      Strengths:

      The study demonstrates significant advancements in leveraging guide RNA (gRNA) as a dimerization module for genome editing, showcasing its high specificity and versatility. By investigating three distinct strategies-splitting pegRNA into sgRNA and petRNA, inserting self-splicing ribozymes within the pegRNA, and dividing the pegRNA at the repeat junction-the researchers present a comprehensive approach to achieving molecular proximity and reconstituting function. Among these methods, splitting the pegRNA at the repeat junction emerged as the most promising, achieving editing efficiencies up to 76% of the control, highlighting its potential for further development in CRISPR-Cas9 systems. Additionally, the study extends genome editing control by linking protein-protein interactions to RNA-mediated editing, using specific protein-RNA interaction pairs to regulate editing through engineered protein proximity. This innovative approach expands the toolkit for precision genome editing, demonstrating the feasibility of controlling genome editing with enhanced specificity and efficiency.

      Weaknesses:

      The initial experiments with splitting the pegRNA into sgRNA and petRNA showed low editing efficiency, less than 2%. Similarly, inserting self-splicing ribozymes within pegRNA was inefficient, achieving under 2% editing efficiency in all constructs tested, possibly hindered by the prime editing enzyme. The editing efficiency of the crRNA and petracrRNA split at the repeat junction varied, with the most promising configurations only reaching 76% of the control efficiency. The RNA-RNA duplex formation's inefficiency might be due to the lack of additional protein binding, leading to potential degradation outside the Cas9-gRNA complex. Extending the approach to control genome editing via protein-protein interactions introduced complexity, with a significant trade-off between efficiency and specificity, necessitating further optimization. The strategy combining RADARS and P3 editing to control genome editing with specific RNA expression events exhibited high background levels of non-specific editing, indicating the need for improved specificity and reduced leaky expression. Moreover, P3 editing efficiencies are exclusively quantified after transfecting DNA into HEK cells, a strategy that has resulted in past reproducibility concerns for other technologies. Overall, the various methods and combinations require further optimization to enhance efficiency and specificity, especially when integrating multiple synthetic modules.

      Thank you for this accurate summary and assessment of the strengths and weaknesses of the P3 editing as it stands. Looking ahead, we agree that further optimizations will be important, as will characterizing the performance of P3 editing in additional cellular contexts. The revised Discussion (see below) now makes these points more clearly.

      Reviewer #2 (Public Review):

      Choi et al. describe a new approach for enabling input-specific CRISPR-based genome editing in cultured cells. While CRISPR-Cas9 is a broadly applied system across all of biology, one limitation is the difficulty in inducing genome editing based on cellular events. A prior study, from the same group, developed ENGRAM - which relies on activity-dependent transcription of a prime editing guide RNA, which records a specific cellular event as a given edit in a target DNA "tape". However, this approach is limited to the detection of induced transcription and does not enable the detection of broader molecular events including protein-protein interactions or exposure to small molecules. As an alternative, this study envisioned engineering the reconstitution of a split prime editing guide RNA (pegRNA) in a protein-protein interaction (PPI)-dependent manner. This would enable location- and content-specific genome editing in a controlled setting.

      The authors explored three different design possibilities for engineering a PPI-dependent split pegRNA. First, they tried splitting pegRNA into a functional sgRNA and corresponding prime editing transRNA, incorporating reverse-complementary dimerization sequences on each guide half. This approach, however, resulted in low editing efficiency across 7 different designs with various complementary annealing template lengths (<2% efficiency). They also tried inserting a self-splicing ribozyme within the pegRNA, which produces a functional pegRNA post-transcriptionally. The incorporation of a split-ribozyme, dependent on a PPI, could have been used to reconstitute the split pegRNA in an event-controlled manner. However again, only modest levels of editing were observed with the self-splicing ribozyme design (<2%). Finally, they tried splitting the pegRNA at the repeat:anti-repeat junction that was used to join the original dual-guide system comprised of a crRNA and tracrRNA, into a single-guide RNA. They incorporated the prime editing features into the tracrRNA half, to create petracrRNA. Dimerization was initially induced by different complementary RNA annealing sequences. Using this design, they were able to induce an editing efficiency of ~28% (compared to 37% efficiency using a positive control epegRNA guide).

      Having identified a suitable split pegRNA system, they next sought to induce the reconstitution of the two halves in a PPI-dependent manner. They replaced the complementary RNA annealing sequences with two different RNA aptamers (MS2 and BoxB). MS2 detects the MCP protein, while BoxB detects the LambdaN protein. Close proximity between MCP and LambdaN would thus bring together the two split pegRNA halves, creating a functional pegRNA that would enable prime editing at a specific target site. They demonstrated that they could induce MCP-BoxB proximity by fusing them to different dimerizing protein partners: 1) constitutive epitope-nanobody/antibody pairs such as scFv/GCN4 or NbALFA/ALFA-Tag; 2) split-GFP; or 3) chemically-induced protein pairs such as FKBP/FRB or ABI/PYL. For all of these approaches, they could achieve between ~20-60% normalized editing efficiency (relative to positive control editing levels with epegRNA). Additional mutation of the linkers between the RNA and aptamers could increase editing efficiency but also increase non-specific background editing even in the absence of an induced PPI.

      Additional applications of this overall strategy included incorporating the design with different DNA base editors, with the most promising examples shown with the base editors CBE4max and ABE8. It should be noted that these specific examples used a non-physiological LambdaN-MCP direct fusion protein as the "bait" that induced reconstitution of the two halves of the guideRNA, rather than relying on a true induced PPI. They also demonstrated that the recently reported RADARS strategy could be incorporated into their system. In this example, they used an ADAR-guide-RNA to drive the expression of a LambdaN-PCP fusion protein in the presence of a specific target RNA molecule, IL6. This induced LambdaN-PCP protein could then reconstitute the split peg-RNAs to drive prime editing. To enable this last application, they replaced the MS2 aptamer in their pegRNA with the PP7 aptamer that binds the PCP protein (this was to avoid crosstalk with RADARS, which also uses MS2/MCP interaction). Using this strategy, they observed a normalized editing efficiency of around 12% (but observed non-specific editing of around 8% in the absence of the target RNA).

      Strengths:

      The strengths of this paper include an interesting concept for engineering guide RNAs to enable activity-dependent genome editing in living cells in the future, based on discreet protein-protein interactions (either constitutively, spatially, or chemically induced). Important groundwork is laid down to engineer and improve these guide RNAs in the future (especially the work describing altering the linkers in Supplementary Figure 3 - which provides a path forward).

      Weaknesses:

      In its current state, the editing efficiency appears too low to be applied in physiological settings. Much of the latter work in the paper relies on a LambdaN-MCP direction fusion protein, rather than two interacting protein pairs. Further characterizations in the future, especially varying the transfection amounts/durations/etc of the various components of the system, would be beneficial to improve the system. It will also be important to demonstrate editing at additional sites; to characterize how long the PPI must be active to enable efficient prime editing; and how reversible the reconstitution of the split pegRNA is.

      Thank you for this assessment of the strengths and weaknesses of the P3 editing as it stands. Looking ahead, we agree that further optimizations will be important, including along the lines suggested by the reviewer, as will further characterization of the system with respect to dependencies, reversibility, etc. The revised Discussion (see below) now makes these points more clearly.

      Recommendations for the authors:

      Reviewing Editor comments:

      It would be helpful to better describe the nature of improvements (on-targeting and/or off-targeting) that would be needed to effectively use this approach in vitro and in vivo applications.

      We agree, and have accordingly revised the last paragraph of our discussion to better describe what improvements are needed for in vitro and in vivo applications:

      “In our view, there are four outstanding challenges for P3 editing to be broadly useful: evaluating additional cellular contexts, the method’s efficiency and specificity, understanding the limit of detectable protein-protein interactions, and the development of sensors compatible with multiplex P3 editing within the same cell. First, we have thus far only conducted P3 editing in HEK293T cells, and obviously needs to be tested in additional cell types. Second, both the efficiency and specificity of the P3 editing need to be improved before it can be used as a selective editing tool in model systems. We have explored how modifying the crRNA and petracrRNA pair sequences can tune the efficiency-vs-specificity tradeoff, but alternative avenues to improvement (e.g., better docking of RNA-aptamers such as MS2, BoxB, or PP7 by testing more linker sequences that place crRNA and petracrRNA for duplex formation) may be more fruitful in terms of achieving high efficiency and specificity at once (e.g., >50% editing in the setting of a specific protein-protein interaction, and <1% editing without it). Second, it is not clear whether weak and transient interactions among proteins can be used to trigger P3 editing. Assuming the genome editing complex formation is reversible, improving P3 editing efficiency may be able to capture different strengths of protein-protein interactions, although some interactions may be too transient to promote functional guide RNA formation. Finally, the current P3 editing design uses a pair of RNA aptamers and their corresponding protein binders, limiting the multiplex detection of protein-protein pairs. More orthogonal protein-RNA pairs need to be identified (e.g., using a massively parallel platform (Buenrostro et al., 2014) and/or computational prediction (Baek et al., 2023)) to allow for large numbers of P3 sensors for different protein-protein interactions to be deployed within the same cell. Overcoming these four challenges is necessary for P3 editing to be broadly useful for gating genome editing on physiological levels of specific protein-protein interactions in a multiplex fashion.”

      Reviewer #2 (Recommendations For The Authors):

      It does not appear that all plasmids necessary to reproduce the results of this paper have been deposited to addgene, but only a small subset. The authors might include that these plasmids are available upon request, if not uploaded to a public repository.

      We have added a statement that additional plasmids are available upon request. Our Data Availability Statement reads (with the added sentence underlined):

      “Raw sequencing data have been uploaded to Sequencing Read Archive (SRA) with the associated BioProject ID PRJNA1004865. The following plasmids have been deposited to Addgene: pU6-crRNA-MS2, pU6-BoxB-petracrRNA, pCMV-LambdaN-MCP, pCMV-LambdaN-NbALFA,  and pCMV-ALFA-MCP (Addgene ID 207624 - 207628). The rest of the plasmids used in this study are available upon request.”

      It could be useful to include somewhere why, specifically, editing the guide RNAs as opposed to the Cas9 itself is advantageous. Light-inducible split Cas9s have been engineered, and I imagine other PPI-inducible split Cas9s have also been engineered. A specific mention of the advantages of using engineered split pegRNAs could put the significance of this work in a better context.

      Thanks for raising this, and we agree. We have revised the first paragraph of the Results section to highlight why we think splitting the guide RNAs as opposed to Cas9 might be advantageous:

      “In the split architecture, the “dimerization module” is a key sensor component. Although strategies that split the protein component of the genome editing complex have been described (e.g., split-Cas9 (Yu et al., 2020)), we reasoned that having the guide RNA serve as the dimerization module rather than the protein, i.e. by splitting it into two parts, and making the restoration of its function dependent on a molecular proximity event, would afford even more control. For example, if multiple split gRNAs were present within the same cell, they could be independently controlled, whereas a split Cas9 would only allow a single control point.  In our initial experiments, we focused on splitting the pegRNA used in prime editing.”

    1. eLife Assessment

      This study is part of an ongoing effort to clarify the effects of cochlear neural degeneration on auditory processing in listeners with normal audiograms. Here the authors provide important new data demonstrating associations between cochlear neural degeneration, non-invasive assays of auditory processing, and speech perception. Based on a cross-species comparison, these findings pose compelling evidence that cochlear synaptopathy is associated with a significant part of hearing difficulties in complex environments for some listeners with normal hearing thresholds, such as older individuals.

    2. Reviewer #1 (Public review):

      This study is part of an ongoing effort to clarify the effects of cochlear neural degeneration (CND) on auditory processing in listeners with normal audiograms. This effort is important because ~10% of people who seek help for hearing difficulties have normal audiograms and current hearing healthcare has nothing to offer them.

      The authors identify two shortcomings in previous work that they intend to fix. The first is a lack of cross-species studies that make direct comparisons between animal models in which CND can be confirmed and humans for which CND must be inferred indirectly. The second is the low sensitivity of purely perceptual measures to subtle changes in auditory processing. To fix these shortcomings, the authors measure envelope following responses (EFRs) in gerbils and humans using the same sounds, while also performing histological analysis of the gerbil cochleae, and testing speech perception while measuring pupil size in the humans.

      The study begins with a comprehensive assessment of the hearing status of the human listeners. The only differences found between the young adult (YA) and middle-aged (MA) groups are in thresholds at frequencies > 10 kHz and DPOAE amplitudes at frequencies > 5 kHz. The authors then present the EFR results, first for the humans and then for the gerbils, showing that amplitudes decrease more rapidly with increasing envelope frequency for MA than for YA in both species. The histological analysis of the gerbil cochleae shows that there were, on average, 20% fewer IHC-AN synapses at the 3 kHz place in MA relative to YA, and the number of synapses per IHC was correlated with the EFR amplitude at 1024 Hz.

      The study then returns to the humans to report the results of the speech perception tests and pupillometry. The correct understanding of keywords decreased more rapidly with decreasing SNR in MA than in YA, with a noticeable difference at 0 dB, while pupillary slope (a proxy for listening effort) increased more rapidly with decreasing SNR for MA than for YA, with the largest differences at SNRs between 5 and 15 dB. Finally, the authors report that a linear combination of audiometric threshold, EFR amplitude at 1024 Hz, and a few measures of pupillary slope is predictive of speech perception at 0 dB SNR.

      I only have two questions/concerns about the specific methodologies used:

      (1) Synapse counts were made only at the 3 kHz place on the cochlea. However, the EFR sounds were presented at 85 dB SPL, which means that a rather large section of the cochlea will actually be excited. Do we know how much of the EFR actually reflects AN fibers coming from the 3 kHz place? And are we sure that this is the same for gerbils and humans given the differences in cochlear geometry, head size, etc.?

      (2) Unless I misunderstood, the predictive power of the final model was not tested on held-out data. The standard way to fit and test such a model would be to split the data into two segments, one for training and hyperparameter optimization, and one for testing. But it seems that the only split was for training and hyperparameter optimization.

      While I find the study to be generally well executed, I am left wondering what to make of it all. The purpose of the study with respect to fixing previous methodological shortcomings was clear, but exactly how fixing these shortcomings has allowed us to advance is not. I think we can be more confident than before that EFR amplitude is sensitive to CND, and we now know that measures of listening effort may also be sensitive to CND. But where is this leading us?

      I think what this line of work is eventually aiming for is to develop a clinical tool that can be used to infer someone's CND profile. That seems like a worthwhile goal but getting there will require going beyond exploratory association studies. I think we're ready to start being explicit about what properties a CND inference tool would need to be practically useful. I have no idea whether the associations reported in this study are encouraging or not because I have no idea what level of inferential power is ultimately required.

      That brings me to my final comment: there is an inappropriate emphasis on statistical significance. The sample size was chosen arbitrarily. What if the sample had been half the size? Then few, if any, of the observed effects would have been significant. What if the sample had been twice the size? Then many more of the observed effects would have been significant (particularly for the pupillometry). I hope that future studies will follow a more principled approach in which relevant effect sizes are pre-specified (ideally as the strength of association that would be practically useful) and sample sizes are determined accordingly.

      So, in summary, I think this study is a valuable but limited advance. The results increase my confidence that non-invasive measures can be used to infer underlying CND, but I am unsure how much closer we are to anything that is practically useful.

    3. Reviewer #2 (Public review):

      Summary:

      This paper addresses the bottom-up and top-down causes of hearing difficulties in middle-aged adults with clinically-normal audiograms using a cross-species approach (humans vs. gerbils, each with two age groups) mixing behavioral tests and electrophysiology. The study is not only a follow-up of Parthasarathy et al (eLife 2020), since there are several important differences.

      Parthasarathy et al. (2020) only considered a group of young normal-hearing individuals with normal audiograms yet with high complaints of hearing in noisy situations. Here, this issue is considered specifically regarding aging, using a between-subject design comparing young NH and older NH individuals recruited from the general population, without additional criterion (i.e. no specifically high problems of hearing in noise). In addition, this is a cross-species approach, with the same physiological EFR measurements with the same stimuli deployed on gerbils.

      This article is of very high quality. It is extremely clear, and the results show clearly a decrease of neural phase-locking to high modulation frequencies in both middle-aged humans and gerbils, compared to younger groups/cohorts. In addition, pupillometry measurements conducted during the QuickSIN task suggest increased listening efforts in middle-aged participants, and a statistical model including both EFRs and pupillometry features suggests that both factors contribute to reduced speech-in-noise intelligibility evidenced in middle-aged individuals, beyond their slight differences in audiometric thresholds (although they were clinically normal in both groups).

      These provide strong support to the view that normal aging in humans leads to auditory nerve synaptic loss (cochlear neural degeneration - CNR- or, put differently, cochlear synaptopathy) as well as increased listening effort, before any clearly visible audiometric deficits as defined in current clinical standards. This result is very important for the community since we are still missing direct evidence that cochlear synaptopathy might likely underlie a significant part of hearing difficulties in complex environments for listeners with normal thresholds, such as middle-aged and senior listeners. This paper shows that these difficulties can be reasonably well accounted for by this sensory disorder (CND), but also that listening effort, i.e. a top-down factor, further contributes to this problem. The methods are sound and well described and I would like to emphasize that they are presented concisely yet in a very precise manner so that they can be understood very easily - even for a reader who is not familiar with the employed techniques. I believe this study will be of interest to a broad readership.

      I have some comments and questions which I think would make the paper even stronger once addressed.

      Main comments:

      (1) Presentation of EFR analyses / Interpretation of EFR differences found in both gerbils and humans:

      a) Could the authors comment further on why they think they found a significant difference only at the highest mod. frequency of 1024 Hz in their study? Indeed, previous studies employing SAM or RAM tones very similar to the ones employed here were able to show age effects already at lower modulation freqs. of ~100H; e.g. there are clear age effects reported in human studies of Vasilikov et al. (2021) or Mepani et al. (2021), and also in animals (see Garrett et al. bioXiv: https://www.biorxiv.org/content/biorxiv/early/2024/04/30/2020.06.09.142950.full.pdf).

      Furthermore, some previous EEG experiments in humans that SAM tones with modulation freqs. of ~100Hz showed that EFRs do not exhibit a single peak, i.e. there are peaks not only at fm but also for the first harmonics (e.g. 2*fm or 3*fm) see e.g.Garrett et al. bioXiv https://www.biorxiv.org/content/biorxiv/early/2024/04/30/2020.06.09.142950.full.pdf.

      Did the authors try to extract EFR strength by looking at the summed amplitude of multiple peaks (Vasilikov Hear Res. 2021), in particular for the lower modulation frequencies? (indeed, there will be no harmonics for the higher mod. freqs).

      b) How do the present EFR results relate to FFR results, where effects of age are already at low carrier freqs? (e.g. Märcher-Rørsted et al., Hear. Res., 2022 for pure tones with freq < 500 Hz). Do the authors think it could be explained by the fact that this is not the same cochlear region, and that synapses die earlier in higher compared to lower CFs? This should be discussed. Beyond the main group effect of age, there were no negative correlations of EFRs with age in the data?

      (2) Size of the effects / comparing age effects between two species:

      Although the size of the age effect on EFRs cannot be directly compared between humans and gerbils - the comparison remains qualitative - could the authors at least provide references regarding the rate of synaptic loss with aging in both humans and gerbils, so that we understand that the yNH/MA difference can be compared between the two age groups used for gerbils; it would have been critical in case of a non-significant age effect in one species.

      Equalization/control of stimuli differences across the two species: For measuring EFRs, SAM stimuli were presented at 85 dB SPL for humans vs. 30 dB above the detection threshold (inferred from ABRs) for gerbils - I do not think the results strongly depend on this choice, but it would be good to comment on why you did not choose also to present stimuli 30 dB above thresholds in humans.

      Simulations of EFRs using functional models could have been used to understand (at least in humans) how the differences in EFRs obtained between the two groups are *quantitatively* compatible with the differences in % of remaining synaptic connections known from histopathological studies for their age range (see the approach in Märcher-Rørsted et al., Hear. Res., 2022)

      (3) Synergetic effects of CND and listening effort:

      Could you test whether there is an interaction between CNR and listening effort? (e.g. one could hypothesize that MA subjects with the largest CND have also higher listening effort).

    1. eLife Assessment

      Although others have proposed that OHC electromotility subserves cochlear amplification by acting as a "fluid pump", and evidence for this has been found using electrical stimulation of excised cochleae, this important study substantially advances our understanding of cochlear homeostasis. This is the first report to test the pumping effect in vivo and consider its implications for cochlear homeostasis and drug delivery. The manuscript provides convincing evidence for OHC-based fluid flow within the cochlea.

    2. Reviewer #1 (Public review):

      Summary:

      The authors test the "OHC-fluid-pump" hypothesis by assaying the rates of kainic acid dispersal both in quiet and in cochleae stimulated by sounds of different levels and spectral content. The main result is that sound (and thus, presumably, OHC contractions and expansions) result in faster transport along the duct. OHC involvement is corroborated using salicylate, which yielded results similar to silence. Especially interesting is the fact that some stimuli (e.g., tones) seem to provide better/faster pumping than others (e.g., noise), ostensibly due to the phase profile of the resulting cochlear traveling-wave response.

      Strengths:

      The experiments appear well controlled and the results are novel and interesting. Some elegant cochlear modeling that includes coupling between the organ of Corti and the surrounding fluid as well as advective flow supports the proposed mechanism.

      The current limitations and future directions of the study, including possible experimental tests, extensions of the modeling work, and practical applications to drug delivery, are thoughtfully discussed.

      Weaknesses:

      Although the authors provide compelling evidence that OHC motility can usefully pump fluid, their claim (last sentence of the Abstract) that wideband OHC motility (i.e., motility in the "tail" region of the traveling wave) evolved for the purposes of circulating fluid---rather then emerging, say, as a happy by-product of OHC motility that evolved for other reasons---seems too strong.

    3. Reviewer #2 (Public review):

      Although recent cochlear micromechanical measurements in living animals have shown that outer hair cells drive broadband vibration of the reticular lamina, the role of this vibration in cochlear fluid circulation remains unknown. The authors hypothesized that motile outer hair cells may facilitate cochlear fluid circulation. To test this hypothesis, they investigated the effects of acoustic stimuli and salicylate, an outer hair cell motility blocker, on kainic acid-induced changes in the cochlear nucleus activities. The results demonstrated that acoustic stimuli reduced the latency of the kainic acid effect, with low-frequency tones being more effective than broadband noise. Salicylate reduced the effect of acoustic stimuli on kainic acid-induced changes. The authors also developed a computational model to provide a physical framework for interpreting experimental results. Their combined experimental and simulated results indicate that broadband outer hair cell action serves to drive cochlear fluid circulation.

      The major strengths of this study lie in its high significance and the synergistic use of electrophysiological recording of the cochlear nucleus responses alongside computational modeling. Cochlear outer hair cells have long been believed to be responsible for the exceptional sensitivity, sharp tuning, and huge dynamic range of mammalian hearing. However, recent observations of the broadband reticular lamina vibration contradict widely accepted view of frequency-specific cochlear amplification. Furthermore, there is currently no effective noninvasive method to deliver the drugs or genes to the cochlea, a crucial need for treating sensorineural hearing loss, one of the most common auditory disorders. This study addresses these important questions by observing outer hair cells' roles in the cochlear transport of kainic acid. The well-established electrophysiological method used to record cochlear nucleus responses produced valuable new data, and the custom-developed developed computational model greatly enhanced the interpretation of the experimental results.

      The authors successfully tested their hypothesis, with both the experimental and modeling results supporting the conclusion that active outer hair cells can enhance cochlear fluid circulation in the living cochlea.

      The findings from this study can potentially be applied for treating sensorineural hearing loss and advance our understanding of how outer hair cells contribute to cochlear amplification and normal hearing.

    4. Reviewer #3 (Public review):

      Summary:

      This study reveals that sound exposure enhances drug delivery to the cochlea through the non-selective action of outer hair cells. The efficiency of sound-facilitated drug delivery is reduced when outer hair cell motility is inhibited. Additionally, low-frequency tones were found to be more effective than broadband noise for targeting substances to the cochlear apex. Computational model simulations support these findings.

      Strengths:

      The study provides compelling evidence that the broad action of outer hair cells is crucial for cochlear fluid circulation, offering a novel perspective on their function beyond frequency-selective amplification. Furthermore, these results could offer potential strategies for targeting and optimizing drug delivery throughout the cochlear spiral.

      Weaknesses:

      The primary weakness of this paper lies in the surgical procedure used for drug administration through the round window. Opening the cochlea can alter intracochlear pressure and disrupt the traveling wave from sound, a key factor influencing outer hair cell activity. However, the authors do not provide sufficient details on how they managed this issue during surgery. Additionally, the introduction section needs further development to better explain the background and emphasize the significance of the work.

      Comments on revisions:

      Thank you for addressing the comments and concerns. The author has responded to all points thoroughly and clarified them well. However, please include the key points from the responses to the comments (Introduction ((3), (5)) and Results ((5)) into the manuscript. While the explanations in the response letter are reasonable, the current descriptions in the manuscript may limit the reader's understanding. Expanding on these points in the Introduction, Results, or Discussion sections would enhance clarity and comprehensiveness.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors test the "OHC-fluid-pump" hypothesis by assaying the rates of kainic acid dispersal both in quiet and in cochleae stimulated by sounds of different levels and spectral content. The main result is that sound (and thus, presumably, OHC contractions and expansions) results in faster transport along the duct. OHC involvement is corroborated using salicylate, which yielded results similar to silence. Especially interesting is the fact that some stimuli (e.g. tones) seem to provide better/faster pumping than others (e.g. noise), ostensibly due to the phase profile of the resulting cochlear traveling-wave response.

      Strengths:

      The experiments appear well controlled and the results are novel and interesting. Some elegant cochlear modeling that includes coupling between the organ of Corti and the surrounding fluid as well as advective flow supports the proposed mechanism.

      Weaknesses:

      It's not clear whether the effect size (e.g., the speed of sound-induced pumping relative to silence) is large enough to have important practical applications (e.g., for drug delivery). The authors should comment on the practical requirements and limitations.

      With our current data, what we can conclude is that modest sound levels (e.g., 75 dB SPL noise or an 80 dB SPL tone) facilitates cochlear drug delivery. We added a paragraph to the Discussion stating some future considerations for application to drug delivery in the human cochlea.

      Although helpful so far as it goes, the modeling could be taken much further to help understand some of the more interesting aspects of the data and to obtain testable predictions. In particular, the authors should systematically explore the level effects they find experimentally and determine whether the model can replicate the finding that different sounds produce different results (e.g. noise vs tone).

      The model should also be used to relate the model's flow rates more quantitatively to the properties of the traveling wave (e.g., its phase profile).

      The present study is focused on explaining the principle of mass transport in the cochlea. The quantification of the relationship between flow rate and traveling wave is an important open question and will be the topic of future studies. Our previous modeling study (Shokrian et al. 2020) showed a clear relation between the traveling wave characteristics (e.g., amplitude and phase velocity) and the mass transport in the Corti fluid. As the reviewer correctly pointed out, the current paper is focused on designing controlled experiments to provide proof of concept along with computational simulations to support our major claim (that outer hair cells stir cochlear fluid). 

      Finally, the model should be used to investigate differences between active and passive OHCs (e.g., simulating the salicylate experiment by disabling the model's OHCs).

      What the reviewer asks for has been demonstrated in previous theoretical studies (Lighthill, 1992; Edom, Obrist, Kleiser, 2014; Sumner, Reichenbach, 2021). In some of the previous studies, it was called the steady streaming. These studies are excellent examples because they simulated the sensitive cochlea (similar level of basilar membrane vibrations) but did not incorporate the Corti fluid peristalsis. Even without the peristaltic motion of the Corti tube, the basilar membrane-scala fluid interaction generated steady streaming (creepy fluid flow). However, the streaming velocity of cochlear models without active peristalsis along the Corti tube is about three orders of magnitude smaller than the active cochlea at a comparable level of basilar membrane vibrations. For example, the peak streaming speed was < 0.1 um/s at 80 dB SPL, and it took > 4 hours for particles to travel 1 mm. This speed is much slower than the particle transport speed due to pure diffusion (Sumner, Reichenbach, 2021).

      The manuscript would be stronger if the authors discussed ways to test their hypothesis that OHC motility serves a protective effect by pumping fluid. For example, do animals held in quiet after noise exposure (TTS) take longer to recover?

      We agree with the reviewer. The following statements were added to the Discussion section. “Our results have implications for cochlear fluid homeostasis. For example, future studies can test the hypothesis that an acoustically rich environment would be beneficial in maintaining healthy hearing as well as in recovering from transient hearing loss.”

      Reviewer #2 (Public review):

      Summary:

      Recent cochlear micromechanical measurements in living animals demonstrated outer hair celldriven broadband vibration of the reticular lamina that contradicts frequency-selective cochlear amplification. The authors hypothesized that motile outer hair cells can drive cochlear fluid circulation. This hypothesis was tested by observing the effects of acoustic stimuli and salicylate, an outer hair cell motility blocker, on kainic acid-induced changes in the cochlear nucleus activities. It was found that acoustic stimuli can reduce the latency of the kainic acid effect, and a low-frequency tone is more effective than broadband noise. Salicylate reduced the effect of acoustic stimuli on kainic acid-induced changes. The authors also developed a computational model to provide the physical basis for interpreting experimental results. It was concluded that experimental data and simulations coherently indicate that broadband outer hair cell action is for cochlear fluid circulation.

      Strengths:

      The major strengths of this study include its high significance and the combination of electrophysiological recording of the cochlear nucleus responses with computational modeling. Cochlear outer hair cells have been believed to be responsible for the exceptional sensitivity, sharp tuning, and huge dynamic range of mammalian hearing. Recent observation of the broadband reticular lamina vibration contradicts frequency-specific cochlear amplification. Moreover, there is no effective noninvasive approach to deliver the drugs or genes to the cochlea for treating sensorineural hearing loss, one of the most common auditory disorders. These important questions were addressed in this study by observing outer hair cells' roles in the cochlear transport of kainic acid. The well-established electrophysiological method for recording cochlear nucleus responses produced valuable new data, and the purposely developed computational model significantly enhanced the interpretation of the data.

      The authors successfully tested their hypothesis, and both the experimental and modeling results support the conclusion that active outer hair cells can drive cochlear fluid circulation in the living cochlea.

      Findings from this study will help auditory scientists understand how the outer hair cells contribute to cochlear amplification and normal hearing.

      We thank the reviewer for acknowledging our effort.

      Weaknesses:

      While the statement "The present study provides new insights into the nonselective outer hair cell action (in the second paragraph of Discussion)" is well supported by the results, the authors should consider providing a prediction or speculation of how this hair cell action enhances cochlear sensitivity. Such discussion would help the readers better understand the significance of the current work.

      We added a potential implication to the Discussion, that an acoustically rich environment could be beneficial in maintaining healthy hearing as well as recovering from damaged hearing.

      Reviewer #3 (Public review):

      Summary:

      This study reveals that sound exposure enhances drug delivery to the cochlea through the nonselective action of outer hair cells. The efficiency of sound-facilitated drug delivery is reduced when outer hair cell motility is inhibited. Additionally, low-frequency tones were found to be more effective than broadband noise for targeting substances to the cochlear apex. Computational model simulations support these findings.

      Strengths:

      The study provides compelling evidence that the broad action of outer hair cells is crucial for cochlear fluid circulation, offering a novel perspective on their function beyond frequency-selective amplification. Furthermore, these results could offer potential strategies for targeting and optimizing drug delivery throughout the cochlear spiral.

      Weaknesses:

      The primary weakness of this paper lies in the surgical procedure used for drug administration through the round window. Opening the cochlea can alter intracochlear pressure and disrupt the traveling wave from sound, a key factor influencing outer hair cell activity. However, the authors do not provide sufficient details on how they managed this issue during surgery. Additionally, the introduction section needs further development to better explain the background and emphasize the significance of the work.

      Although we wrote that the inner ear left intact, it might have not been sufficiently clear. Our surgical approach leaves the inner ear intact, including the round-window membrane. The round window in gerbil is concave like a bowl. We applied 4 µL of kainic acid solution in the round-window niche, without perforating the round-window membrane. 

      Recommendations For The Authors:

      Reviewer #1 (Recommendations for the authors):

      The authors' choice to frame their findings by hinting that they have discovered the "real" reason for the evolution of broadband OHC electromotility (e.g., the first and last sentences of the abstract and parts of the Discussion), although clearly intended to boost the perceived significance of the work, does them no favors and will probably lead to distracting criticisms they could easily have avoided. The manuscript would be significantly improved by removing or downplaying these rather speculative and unsupported claims; the work stands on its own without them.

      We agree that the first line of the Abstract might distract the readers. Meanwhile, in the Discussion, we believe the readers will appreciate our speculation of how this study is relevant to recent debates on hearing mechanics. Following the reviewer’s advice, we have revised the Abstract.

      Reviewer #3 (Recommendations for the authors):

      Please review the detailed comments below. I hope they contribute to enhancing the paper:

      We thank the reviewer for this detailed advice. All of these comments make good sense and were very helpful in improving this paper or in planning future studies. 

      Many of the comments were relevant to the computer model, and they have one common basis, which we have not yet achieved. I.e., simulating the level-dependence. 

      I. Introduction

      (1) Please clarify and improve this sentence. Effective and safe strategies for delivering treatments to the inner ear have been reported: 'Consequently, intervening in hearing health by delivering substances to the inner-ear fluid is challenging'.

      The preceding statement is regarding the blood-labyrinthine barrier (BLB), comparable to the bloodbrain barrier (BBB). We revised the statement: “Consequently, intervening in hearing health by delivering substances to the inner-ear fluid through systemic circulation is challenging.”

      (2) Please expand on how the secretion and absorption of ions and molecules maintain the unique ionic compositions of the two intracochlear fluids. Include details on the role of the stria vascularis and the specific functions of the three types of strial cells in this process.

      In response to this request, we added a paragraph discussing cochlear fluid homeostasis. Our study is different from existing homeostasis studies in three regards. First, the site: Existing studies are centered on the stria vascularis, while this study concerns the Corti fluid. Second, the mechanism: Existing studies are regarding metabolic transport, while our scope is the transport due to fluid flow. Third, the range: Existing studies considered local electrochemical equilibrium within a radial section, while this study concerns global (longitudinal) mass transport. To address this comment, the following was added to the Discussion.

      “Our study complements existing studies regarding cochlear fluid homeostasis and differs from previous studies in several ways. The intrastrial fluids (extracellular fluids in the stria vascularis) have been more thoroughly investigated because the three layers in the stria vascularis (marginal, intermediate, and basal cells) maintain the endocochlear potential (Wangemann 2006).

      Equilibrium in the Corti fluid has been sparsely investigated because its electrochemical gradient is modest compared to that of the intrastrial fluids (Johnstone, Patuzzi et al. 1989; Zidanic and Brownell 1990). Local electrochemical balance in the cochlear fluids has been considered within a radial section (Quraishi and Raphael 2008; Patuzzi 2011; Nin, Hibino et al. 2012). Our study is focused on the longitudinal (global) equilibrium along the cochlear coil and did not consider the equilibrium across the stria vascularis cell layers. To examine whether the longitudinal fluid flow driven by outer hair cells is strong enough to affect cochlear fluid homeostasis, future studies should measure the K+ equilibrium and recycling along the length of the Corti fluid under sound and silence conditions.“

      (3) Please provide a more detailed explanation and definition of a longitudinal electrochemical gradient, including how it functions and its relevance in physiological processes.

      The most researched electrochemical gradient of the cochlea must be the endocochlear potential that varies along the cochlear length. The endocochlear potential at any location is determined by the equilibrium between the source and the sink. In the view of the Corti fluid, the source is the potassium current out of the hair cells and the sink is the resorption of potassium by supporting cells. The effect of a longitudinal electrochemical gradient on hearing physiology is beyond the scope of this study. To do so would require incorporating detailed K+ equilibrium dynamics. This certainly is one of our future directions. 

      (4) Please include the necessary references to support these three sentences: "Diffusion is an effective mechanism for a substance to travel along submicrometer distances. For instance, it takes microseconds for neurotransmitters to diffuse across a 20-nm synaptic gap. In contrast, diffusion is inefficient for travel on the centimeter scale. It takes days for a drug applied at the round window to travel 30 mm to the apical end of the human cochlea. In practice, the substance would not reach the apex because it would be resorbed before traveling the distance".   

      A reference was added (Berg, 1993). Our description of diffusion is based on the fundamental physics of Fick’s laws.

      (5) In paragraph 3, the author only discussed a portion of the previous approaches. There are numerous methods for inner ear delivery, including external, middle ear, and direct inner ear delivery via the round window or semicircular canal. Each method has its pros and cons, which the authors should carefully address. For example, the semicircular canal approach doesn't require two perforations in the inner ear and distributes the injection evenly throughout the cochlea.  

      A recent review paper regarding inner ear drug delivery was added as a reference (Szeto, Chiang et al. 2020). Drug delivery is a means to demonstrate the OHC’s role in longitudinal mass transport. We are concerned that comparing different drug delivery modalities in detail would distract the readers from the main point of this study. We mentioned ‘one remedy’ with two perforations, for which abundant case studies are found in the literature. Discussing existing approaches exhaustively can be better done by review papers.

      (6) The following sentence is inaccurate and should be carefully rephrased. Previous reports chose higher volumes than the actual fluid volume to maximize the drug (or gene) effect, but this was not a requirement of the delivery methods: 'Such an invasive approach requires the injection of a substantial fluid volume, larger than the entire perilymph in the inner ear'.

      We revised the statement to relax the wording ‘require’: ‘Such an invasive approach is often associated with the injection of a substantial fluid volume, larger than the entire perilymph in the inner ear (Szeto, Chiang et al. 2020)'. This statement might be acceptable because we found few invasive delivery papers that used < 1 µL. Moreover, the physics basis of the injection method is to replace the fluid in a labyrinth compartment with a new fluid (a good example where this fluid physics was tested with quantitative data is the Lichtenhan et al. 2016 paper).

      (7) Please provide the necessary references. Also, clarify what is meant by 'actuator cells'. Are you referring to hair cells?: 'The tube-shaped organ of Corti (OoC) is lined with actuator cells and the cells are activated systematically with a large phase velocity (> a few m/s) toward the apex'.

      Yes, we meant OHCs as the actuator cells. This point has been clarified. A reference for the phase velocity has been added (Olson, Duifhuis, Steele, 2012).

      II. Results

      (1) Is there a specific reason you use 60 or 75 dB SPL for broadband sounds, but opt for louder sounds (80 dB SPL) for pure tones?

      It is not straightforward to compare the SPL between broadband noise and a pure tone, and we did not attempt to ‘equate’ them in any way. 

      (2) Please provide specific details about the sound generation protocol, including the duration, start time, end time, and any other relevant parameters. Here is an example of a vague sentence. Do you play the sounds continuously during these time periods, or only at specific intervals?: 'In two example cases, the effect time at low-CF locations (CFs near 2 kHz) was 15 minutes for the case of the 0.5 kHz tone (Fig. 3A)'

      It is described in the Measurement protocol part of the Methods section (see the red text below). In the exampled case and all other cases, the sounds were played continually (not continuously).

      For the “Sound” protocol, 1.1-s noise pips (60 or 75 dB SPL, 0.1-12 kHz bandwidth, 0.8-s duration including 0.15-s onset/offset ramps) were presented continually. After 48 noise pips, one 1.1-s silent pause and three CF tone pips followed (a total of 51 pips and a pause make a 57.2-s sequence). The CF tone pips were presented at the level of 35 dB SPL to monitor neural responses. The silence pause was to monitor spontaneous neural responses. The sequence was repeated until neural signals at the lowest CF site were completely abolished. The neural responses presented in this study are the ‘driven responses’ obtained by subtracting the spontaneous responses from the responses to the 35 dB CF tones. For the “Silence” or “Pure-tone” protocol, the noise pips of the Sound protocol were replaced with either silence pauses or a pure tone at 80 dB SPL.

      (3) Providing a schematic timeline of your experiments indicating sound generation, kainic acid (and salicylate) application, as well as DPOAE and AVCN recordings would greatly help in understanding and following your results.

      We have revised Figure 2.

      (4) How did you control the opening(s) for the injection? The openings could alter intracochlear pressure and affect the traveling wave from the sound, which is the major factor influencing outer hair cell activity.

      We did not open the inner ear. The round window remained intact. Opening the bulla does not affect the intracochlear pressure. We have clarified this issue, beginning with the first sentence of the Abstract. Thanks for raising this important question.

      (5) Is there any reason why the author generated only low and mid-frequencies? If so, please address what the limitations were in testing high frequency.

      There are no limitations to testing high frequencies. High frequencies would not affect drug delivery to the apex of the cochlea because the traveling waves stop right after the CF location. We are interested in delivering drugs deeper into the apex. Our presented results support this reasoning: mid-frequency stimulation was less effective for delivery to the low CF location.

      (6) I suggest combining Figures 3E and 3F to facilitate a direct comparison between the Silence and Noise conditions, as the MF and LF plots are overlapping in these panels.

      We considered this change but realized that it might introduce confusion and difficulty in parsing the results. Moreover, the two panels have their respective messages. 

      (7) In Figure 3E, why does the LF tone affect both Low and Mid CFs, while the MF tone only affects Mid CF?

      The cochlear traveling wave stops right after the CF location. Peristaltic action takes place in the broad tail region of the traveling waves (see Fig. 5C).

      III. Materials and Methods

      (1) Please provide details about your injection protocol. Did you create additional perforations? How did you target the round window? What was the injection rate? How did you seal the round window, and so on?

      The inner ear including the round window was left intact. Only the bulla was open.

      (2) Please include details about your surgical procedure for the AVCN recording, including probe insertion.

      AVCN recording is a well-established technique. Instead of reintroducing the method, we added a classical reference with friendlier description (Frisina, Chamberlain, et al., 1982). 

      IV. Minor points

      (1) Please include the full terms for the abbreviations 'CF', 'DPOAEs', 'PT', 'IP', and 'RW' for readers who are not in the hearing research field.

      We have checked that these abbreviations were defined.

      (2) Are 'GXXX's in figures animal identifiers? Please clarify what they represent.

      Yes, they are animal identifiers. We have clarified this point in Fig. 1 caption.

    1. eLife Assessment

      This study presents valuable findings related to seasonal brain size plasticity in the Eurasian common shrew (Sorex araneus), which is an excellent model system for these studies. The evidence supporting the authors' claims is convincing. However, the authors should be careful when applying the term adaptive to the gene expression changes they observe; it would be challenging to demonstrate the differential fitness effects of these gene expression changes. The work will be of interest to biologists working on neuroscience, plasticity, and evolution.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, Thomas et al. set out to study seasonal brain gene expression changes in the Eurasian common shrew. This mammalian species is unusual in that it does not hibernate or migrate but instead stays active all winter while shrinking and then regrowing its brain and other organs. The authors previously examined gene expression changes in two brain regions and the liver. Here, they added data from the hypothalamus, a brain region involved in the regulation of metabolism and homeostasis. The specific goals were to identify genes and gene groups that change expression with the seasons and to identify genes with unusual expression compared to other mammalian species. The reason for this second goal is that genes that change with the season could be due to plastic gene regulation, where the organism simply reacts to environmental change using processes available to all mammals. Such changes are not necessarily indicative of adaptation in the shrew. However, if the same genes are also expression outliers compared to other species that do not show this overwintering strategy, it is more likely that they reflect adaptive changes that contribute to the shrew's unique traits.

      The authors succeeded in implementing their experimental design and identified significant genes in each of their specific goals. There was an overlap between these gene lists. The authors provide extensive discussion of the genes they found.

      The scope of this paper is quite narrow, as it adds gene expression data for only one additional tissue compared to the authors' previous work in a 2023 preprint. The two papers even use the same animals, which had been collected for that earlier work. As a consequence, the current paper is limited in the results it can present. This is somewhat compensated by an expansive interpretation of the results in the discussion section, but I felt that much of this was too speculative. More importantly, there are several limitations to the design, making it hard to draw stronger conclusions from the data. The main contribution of this work lies in the generated data and the formulation of hypotheses to be tested by future work.

      Strengths:

      The unique biological model system under study is fascinating. The data were collected in a technically sound manner, and the analyses were done well. The paper is overall very clear, well-written, and easy to follow. It does a thorough job of exploring patterns and enrichments in the various gene sets that are identified.

      I specifically applaud the authors for doing a functional follow-up experiment on one of the differentially expressed genes (BCL2L1), even if the results did not support the hypothesis. It is important to report experiments like this and it is terrific to see it done here.

      Weaknesses:

      While the paper successfully identifies differentially expressed seasonal genes, the real question is (as explained by the authors) whether these are evolved adaptations in the shrews or whether they reflect plastic changes that also exist in other species. This question was the motivation for the inter-species analyses in the paper, but in my view, these cannot rigorously address this question. Presumably, the data from the other species were not collected in comparable environments as those experienced by the shrews studied here. Instead, they likely (it is not specified, and might not be knowable for the public data) reflect baseline gene expression. To see why this is problematic, consider this analogy: if we were to compare gene expression in the immune system of an individual undergoing an acute infection to other, uninfected individuals, we would see many, strong expression differences. However, it would not be appropriate to claim that the infected individual has unique features - the relevant physiological changes are simply not triggered in the other individuals. The same applies here: it is hard to draw conclusions from seasonal expression data in the shrews to non-seasonal data in the other species, as shrew outlier genes might still reflect physiological changes that weren't active in the other species.

      There is no solution for this design flaw given the public data available to the authors except for creating matched data in the other species, which is of course not feasible. The authors should acknowledge and discuss this shortcoming in the paper.

      Related to the point above: in the section "Evolutionary Divergence in Expression" it is not clear which of the shrew samples were used. Was it all of them, or only those from winter, fall, etc? One might expect different results depending on this. E.g., there could be fewer genes with inferred adaptive change when using only summer samples. The authors should specify which samples were included in these analyses, and, if all samples were used, conduct a robustness analysis to see which of their detected genes survive the exclusion of certain time points.

      In the same section, were there also genes with lower shrew expression? None are mentioned in the text, so did the authors not test for this direction, or did they test and there were no significant hits?

      The Discussion is too long and detailed, given that it can ultimately only speculate about what the various expression changes might mean. Many of the specific points made (e.g. about the blood-brain-barrier being more permissive to sensing metabolic state, about cross-organ communication, the paragraphs on single, specific genes) are a stretch based on the available data. Illustrating this point, the one follow-up experiment the authors did (on BCL2L1) did not give the expected result. I really applaud the authors for having done this experiment, which goes beyond typical studies in this space. At the same time, its result highlights the dangers of reading too much into differential expression analyses.

      There is no test of whether the five genes observed in both analyses (seasonal change and inter-species) exceed the number expected by chance. When two gene sets are drawn at random, some overlap is expected randomly. The expected overlap can be computed by repeated draws of pairs of random sets of the same size as seen in real data and by noting the overlap between the random pairs. If this random distribution often includes sets of five genes, this weakens the conclusions that can be drawn from the genes observed in the real data.

    3. Reviewer #2 (Public review):

      Summary:

      Shrews go through winter by shrinking their brain and most organs, then regrow them in the spring. The gene expression changes underlying this unusual brain size plasticity were unknown. Here, the authors looked for potential adaptations underlying this trait by looking at differential expression in the hypothalamus. They found enrichments for DE in genes related to the blood-brain barrier and calcium signaling, as well as used comparative data to look at gene expression differences that are unique in shrews. This study leverages a fascinating organismal trait to understand plasticity and what might be driving it at the level of gene expression. This manuscript also lays the groundwork for further developing this interesting system.

      Strengths:

      One strength is that the authors used OU models to look for adaptation in gene expression. The authors also added cell culture work to bolster their findings.

      Weaknesses:

      I think that there should be a bit more of an introduction to Dehnel's phenomenon, given how much it is used throughout.

    4. Reviewer #3 (Public review):

      Summary:

      In their study, the authors combine developmental and comparative transcriptomics to identify candidate genes with plastic, canalized, or lineage-specific (i.e., divergent) expression patterns associated with an unusual overwintering phenomenon (Dehnel's phenomenon - seasonal size plasticity) in the Eurasian shrew. Their focus is on the shrinkage and regrowth of the hypothalamus, a brain region that undergoes significant seasonal size changes in shrews and plays a key role in regulating metabolic homeostasis. Through combined transcriptomic analysis, they identify genes showing derived (lineage-specific), plastic (seasonally regulated), and canalized (both lineage-specific and plastic) expression patterns. The authors hypothesize that genes involved in pathways such as the blood-brain barrier, metabolic state sensing, and ion-dependent signaling will be enriched among those with notable transcriptomic patterns. They complement their transcriptomic findings with a cell culture-based functional assessment of a candidate gene believed to reduce apoptosis.

      Strengths:

      The study's rationale and its integration of developmental and comparative transcriptomics are well-articulated and represent an advancement in the field. The transcriptome, known for its dynamic and plastic nature, is also influenced by evolutionary history. The authors effectively demonstrate how multiple signals-evolutionary, constitutive, and plastic-can be extracted, quantified, and interpreted. The chosen phenotype and study system are particularly compelling, as it not only exemplifies an extreme case of Dehnel's phenotype, but the metabolic requirements of the shrew suggest that genes regulating metabolic homeostasis are under strong selection.

      Weaknesses:

      (1) In a number of places (described in detail below), the motivation for the experimental, analytical, or visualization approach is unclear and may obscure or prevent discoveries.

      (2) Temporal Expression - Figure 1 and Supplemental Figure 2 and associated text:<br /> - It is unclear whether quantitative criteria were used to distinguish "developmental shift" clusters from "season shift" clusters. A visual inspection of Supplemental Figure 2 suggests that some clusters (e.g., clusters 2, 8, and to a lesser extent 12) show seasonal variation, not just developmental differences between stages 1 and 2. While clustering helps to visualize expression patterns, it may not be the most appropriate filter in this case, particularly since all "season shift" clusters are later combined in KEGG pathway and GO analyses (Figure 1B).<br /> - The authors do not indicate whether they perform cluster-specific GO or KEGG pathway enrichment analyses. The current analysis picks up relevant pathways for hypothalamic control of homeostasis, which is a useful validation, but this approach might not fully address the study's key hypotheses.

      (3) Differential expression between shrinkage (stage 2) and regrowth (stage 4) and cell culture targets<br /> - The rationale for selecting BCL2L1 for cell culture experiments should be clarified. While it is part of the apoptosis pathway, several other apoptosis-related genes were identified in the differential gene expression (DGE) analysis, some showing stronger differential expression or shrew-specific branch shifts. Why was BCL2L1 prioritized over these other candidates?<br /> - The authors mention maintaining (or at least attempting to maintain) a 1:1 sex ratio for the comparative analysis, but it is unclear if this was also done for the S. araneus analysis. If not, why? If so, was sex included as a covariate (e.g., a random effect) in the differential expression analysis? Sex-specific expression elevates with group variation and could impact the discovery of differentially expressed genes.

      (4) Discussion: The term "adaptive" is used frequently and liberally throughout the discussion. The interpretation of seasonal changes in gene expression as indicators of adaptive evolution should be done cautiously as such changes do not necessarily imply causal or adaptive associations.

    5. Author response:

      In response to your comments, we will revise our manuscript to address the limitations raised, including our ability to rigorously test how observed changes in gene expression in shrews are adaptive. The phylogenetic ANOVA we use (EVE), tests for a separate RNA expression optimum specific to the shrew lineage for each gene, and is consistent with expectations for adaptive evolution of gene expression. However, as you noted, while this analysis highlights many candidate genes potentially under positive selection, further functional validation is required to confirm if and how these genes contribute to Dehnel’s phenomenon. We will emphasize that inferred adaptive expression of these genes is putative in our discussion and outline that future studies are needed to test the function of proposed adaptations. For example, cell line validations of BCL2L1 on apoptosis is a case study that tests the function of a putatively adaptive change in gene expression, and it illuminates this limitation. We will also refine our discussion to focus more on pathway-level analyses rather than on individual genes.

      We recognize that our methodological choices may not have been fully transparent, such as our selection of gene expression clusters for the pathway enrichment analysis and our focus on BCL2L1 for functional validation in cell lines. We will expand on these decisions in the methods section to provide greater clarity for our readers.

      Regarding the use of sex as a covariate, we acknowledge the concerns raised. In our evolutionary analyses, we maintained a balanced sex ratio when possible. EVE models handle the effect of sex on gene expression as intraspecific variation, reflective of plasticity. In shrews, however, we used males exclusively. Females were only found among juvenile individuals and including them would have introduced developmental variation with larger, negative impacts on these results. For the seasonal data, we will now include sex as a covariate in differential expression analyses, however, our design is imbalanced in relation to sex. We will account for this limitation and discuss it further in the revised manuscript.

    1. eLife Assessment

      This study used a novel continuous dot motion decision-making task to measure participants' perception and uncertainty/confidence in a social context. The social element is that participants can see another player's responses as well as their own. The study is a useful contribution to social decision-making primarily by introducing a new task and offering solid evidence on how participants are impacted by others' decisions during continuous perceptual choices. The manuscript could be improved through streamlining, more consistent use of terms such as "dyadic" and clarification about the differences between primary uncertainty and metacognitive confidence.

    2. Reviewer #1 (Public review):

      Summary:

      This paper reports an interesting and clever task that allows the joint measurement of both perceptual judgments and confidence (or subjective motion strength) in real/continuous time. The task is used together with a social condition to identify the (incidental, task-irrelevant) impact of another player on decision-making and confidence.

      Strengths:

      The innovation on the task alone is likely to be impactful for the field, extending recent continuous report (CPR) tasks to examine other aspects of perceptual decision-making and allowing more naturalistic readouts. One interesting and novel finding is the observation of dyadic convergence of confidence estimates even when the partner is incidental to the task performance, and that dyads tend to be more risk-seeking (indicating greater confidence) than when playing solo. The paper is well-written and clear.

      Weaknesses:

      (1) One concern with the novel task is whether confidence is disambiguated from a tracking of stimulus strength or coherence. The subjects' task is to track motion direction and use the eccentricity of the joystick to control the arc of a catcher - thus implementing a real-time sensitivity to risk (peri-decision wagering). The variable-width catcher has been used to good effect in other confidence/uncertainty tasks involving learning the spread of targets (the Nassar papers). But in the context of an RDK task, one simple strategy here is to map eccentricity directly to (subjective) motion coherence - such that the joystick position at any moment in time is a vector with motion direction and strength. This would still be an interesting task - but could be solved without invoking metacognition or the need to estimate confidence in one's motion direction decision (the analyses in Supplementary Figure 2 are nice in showing a dissociation from (objective) coherence, such that even within a coherence level, changes in eccentricity scale with direction precision - but this does not get around the potential conflation of confidence with fluctuations in motion energy).

      In other words, in this deflationary framing, what the subjects might be doing is tracking two features of the world - motion strength and direction. This possibility needs to be ruled out if the authors want to claim a mapping between eccentricity and decision confidence (for instance, an ideal observer model of the task that set eccentricity proportional to instantaneous motion strength presumably would also sensibly accrue reward targets, without the need to compute confidence in the direction response). This would be straightforward to simulate and would establish a baseline model against which to compare claims about confidence (eg when evaluating additional social modulations). More generally it casts doubt on claims such as the one on line 210 that eccentricity was "chosen freely via metacognitive assessment of the current perceptual process, [and] can be treated as a proxy measure of subjective perceptual confidence."

      One route to doing this would be to ask whether the eccentricity reports show statistical signatures of confidence that have been established for more classical punctate tasks. Here a key move has been to identify qualitative patterns in the frame of reference of choice accuracy - with confidence scaling positively with stimulus strength for correct decisions, and negatively with stimulus strength for incorrect decisions (the so-called X-pattern, for instance Sanders et al. 2016 Neuron https://pubmed.ncbi.nlm.nih.gov/27151640/).

      (2) I was surprised not to see more analysis of the continuous report data as a function of (lagged) task variables. Some of this analysis is shown in Figure 2b relative to an (objective) direction change, and also in the cross-correlation plots in Supplementary Figure 1d. But to fully characterise the task behaviour it also seems important to ask how and whether fluctuations in motion energy (assuming that the RDK frames were recorded) during a steady state phase are affecting continuous reporting of direction and eccentricity, prior to asking how social information is incorporated into subjects' behaviour.

      Minor points:

      (1) Lines 295-298, isn't it guaranteed to observe these three behavioural patterns (both participants improving, both getting worse, only one improving while the other gets worse) even in random data?

      (2) Lines 703-707, it wasn't clear what the AUC values referred to here (also in Figure 3) - what are the distributions that are being compared? I think part of the confusion here comes from AUC being mentioned earlier in the paper as a measure of metacognitive sensitivity (correct vs. incorrect trial distributions), whereas my impression here is that here AUC is being used to investigate differences in variables (eg confidence) between experimental conditions.

      (3) Could the findings of the worse solo player benefitting more than the better solo player (Figure 4c) be partly due to a compressive ceiling effect - eg there is less room to move up the psychometric function for the higher-scoring player?

    3. Reviewer #2 (Public review):

      Summary:

      Schneider et al examine perceptual decision-making in a continuous task setup when social information is also provided to another human (or algorithmic) partner. The authors track behaviour in a visual motion discrimination task and report accuracy, hit rate, wager, and reaction times, demonstrating that choice wager is affected by social information from the partner.

      Strengths:

      There are many things to like about this paper. The visual psychophysics has been undertaken with much expertise and care to detail. The reporting is meticulous and the coverage of the recent previous literature is reasonable. The research question is novel.

      Weaknesses:

      The paper is difficult to read. It is very densely written, with little to distinguish between what is a key message and what is an auxiliary side note. The Figures are often packed with sometimes over 10 panels and very long captions that stick to the descriptive details but avoid clarity. There is much that could be shifted to supplementary material for the reader to get to the main points.

      Example: In lines 176-181, we read about reaction times in the motion task with a level of detail and repetition that has very little relevance to the message of the paper. When we get to social condition and we read about RT in lines 239-243, it is not quite clear what it is that we should take away from this.

      Another example: the word "eccentricity" is used to refer to "deviation from central position" as a measure of wager. But we see in Figure 1 that it actually refers to the width of the ARC straddling the reported direction of motion. The confusion is compounded when we see in Figure 2b that the two subjects' different levels of confidence are (short red and long green) arcs at the SAME Eccentricity and overlap one another. The use of the word eccentricity is clearly driven by the Joystick action description and is in direct conflict with the meaning of what eccentricity is in visual perception.

      A third and very important one is what the word "dyadic" refers to in the paper. The subjects do not make any joint decisions. However, the authors calculate some "dyadic score" to measure if the group has been able to do better than individuals. So the word dyadic sometimes refers to some "nominal" group. In other places, dyadic refers to the social experimental condition. For example, we see in Figure 3c that AUC is compared for solo vs dyadic conditions. This is confusing.

      A key problem with the paper is that it introduces many terms and the main text often overlooks defining them clearly. I still do not understand the difference between Accuracy and Hit in the paper's jargon. The same goes for "score". Please note that the answer "this is defined in the supplementary method" is not acceptable. These are key constructs in the paper. The flow of the paper's main text depends on them.

    4. Author response:

      We sincerely thank you for your constructive and insightful feedback on our manuscript, including the assessment of its strengths and suggestions for improvements. This will allow us to enhance the clarity and impact of our work. In our revised manuscript, we will address your recommendations as follows:

      (1) Disambiguating whether the joystick eccentricity reflects the subject’s confidence or simply the perceived stimulus strength or coherence

      We agree that this is a pivotal issue for the interpretation of our results. We are confident that the joystick “eccentricity” (i.e., radial joystick deviation from the center) does not simply correlate with the moment-to-moment fluctuations of stimulus coherence. The observations that the radial joystick response varied considerably more than the stimulus fluctuations within each subject and each coherence level, and the analysis of metacognitive sensitivity, suggest that subjects indeed incorporated confidence judgements into their continuous reports. As proposed, we will further explore the established signatures of metacognitive confidence reports, and we will quantify the motion energy fluctuations within time intervals where the nominal stimulus parameters remained constant, to examine whether accuracy and confidence levels vary in response to these fluctuations. This approach will provide deeper insights into continuous dynamics within our paradigm.

      (2) Rationale for Social Investigation

      We will clarify the rationale and methodology of the social aspects in our experiments to better contextualize our approach and findings and their relationship to the field of collective decision-making. In particular, we will further emphasize that while our paradigm indeed did not impose integrating the information from the partner and did not involve incentives for collectively solving the task, the participants could (and did) incorporate the social information into their judgements and mostly improved their earnings. In this way, our approach complements the studies that required joint decisions.

      (3) Streamlining and Terminology

      We will streamline the text and figure legends to present our main arguments more concisely and improve the overall flow of the manuscript. Additionally, we will include a glossary to the main text to clarify terminology, enhancing accessibility and ensuring consistent understanding of key terms throughout the paper.

      To clarify two of the points upfront, we indeed used the term “eccentricity” not in a visual science sense but as the measure of radial joystick deviation from the center and the corresponding angular width of the response arc; we now realize that this is confusing in the context of visual psychophysics paper and will use another word. The term “dyadic” was meant to describe the experimental condition when two participants worked on the task, and associated measures of performance in this condition. The “dyadic score”, defined as the average score across the two participants in the dyadic condition, will be renamed as “combined score”.  

      (4) Incorporation of Additional Literature

      We acknowledge and appreciate the recommendations for additional relevant literature, which we will incorporate into our discussion. This will allow us to contextualize our findings more thoroughly within the existing body of research and highlight the broader implications of our work.

    1. Author response:

      eLife Assessment

      This valuable study uses consensus-independent component analysis to highlight transcriptional components (TC) in high-grade serous ovarian cancers (HGSOC). The study presents a convincing preliminary finding by identifying a TC linked to synaptic signaling that is associated with shorter overall survival in HGSOC patients, highlighting the potential role of neuronal interactions in the tumor microenvironment. This finding is corroborated by comparing spatially resolved transcriptomics in a small-scale study; a weakness is in being descriptive, non-mechanistic, and requiring experimental validation.

      We sincerely thank the editors for the valuable and constructive feedback. We appreciate the recognition of our findings and the significance of identifying transcriptional components in high-grade serous ovarian cancers. We acknowledge the insightful point on our study's descriptive nature and limited mechanistic depth. While further experimental validation would indeed enhance our conclusions, such work extends beyond the current scope of this manuscript. However, we would like to highlight that mechanistic studies demonstrating the impact of tumor-infiltrating nerves on disease progression are emerging (Zahalka et al., 2017; Allen et al., 2018; Balood et al., 2022; Jin et al., 2022; Globig et al., 2023; Restaino et al., 2023; Darragh et al., 2024). Importantly, members of our group have contributed to these findings. These studies, including in vitro and in vivo work in head and neck squamous cell carcinoma as well as high-grade serous ovarian carcinoma, demonstrate that substance P released from tumor-infiltrating nociceptors potentiates MAP kinase signaling in cancer cells, thereby influencing disease progression. This effect can be mitigated in vivo by blocking the substance P receptor (Restaino et al., 2023). Our present work identifies a transcriptional component that aligns with the presence of functional nerves within malignancies. These published mechanistic studies support our findings and suggest that this transcriptional component could serve as a potential screening tool to identify innervated tumors. Such information is clinically relevant, as patients with innervated tumors may benefit from more aggressive therapy.

      Reviewer #1 (Public review):

      This manuscript explores the transcriptional landscape of high-grade serous ovarian cancer (HGSOC) using consensus-independent component analysis (c-ICA) to identify transcriptional components (TCs) associated with patient outcomes. The study analyzes 678 HGSOC transcriptomes, supplemented with 447 transcriptomes from other ovarian cancer types and noncancerous tissues. By identifying 374 TCs, the authors aim to uncover subtle transcriptional patterns that could serve as novel drug targets. Notably, a transcriptional component linked to synaptic signaling was associated with shorter overall survival (OS) in patients, suggesting a potential role for neuronal interactions in the tumor microenvironment. Given notable weaknesses like lack of validation cohort or validation using another platform (other than the 11 samples with ST), the data is considered highly descriptive and preliminary.

      Strengths:

      (1) Innovative Methodology:

      The use of c-ICA to dissect bulk transcriptomes into independent components is a novel approach that allows for the identification of subtle transcriptional patterns that may be overshadowed in traditional analyses.

      We sincerely thank the reviewer for recognizing the strengths and novelty of our study. We appreciate the positive feedback on our use of consensus-independent component analysis (c-ICA) to decompose bulk transcriptomes, which we believe allowed us to detect subtle transcriptional signals often overlooked in traditional analyses.

      (2) Comprehensive Data Integration:

      The study integrates a large dataset from multiple public repositories, enhancing the robustness of the findings. The inclusion of spatially resolved transcriptomes adds a valuable dimension to the analysis.

      Thank you for recognizing the robustness of our study through comprehensive data integration. We appreciate the acknowledgment of our efforts to leverage a large, multi-source dataset, as well as the additional insights gained from spatially resolved transcriptomes. We believe this integrative approach enhances the depth of our analysis and contributes to a more nuanced understanding of the tumor microenvironment.

      (3) Clinical Relevance:

      The identification of a synaptic signaling-related TC associated with poor prognosis highlights a potential new avenue for therapeutic intervention, emphasizing the role of the tumor microenvironment in cancer progression.

      We appreciate the reviewer’s recognition of the clinical implications of our findings. The identification of a synaptic signaling-related transcriptional component associated with poor prognosis underscores the potential for novel therapeutic targets within the tumor microenvironment. We agree that this insight could open new avenues for intervention and further highlights the role of neuronal interactions in cancer progression.

      Weaknesses:

      (1) Mechanistic Insights:

      While the study identifies TCs associated with survival, it provides limited mechanistic insights into how these components influence cancer progression. Further experimental validation is necessary to elucidate the underlying biological processes.

      We appreciate the reviewer’s point regarding the limited mechanistic insights provided in our study. We agree that further experimental validation would enhance our understanding of how the biology captured by these transcriptional components influence cancer progression. However, we respectfully note that such validation is beyond the current scope of this article.   Our current analyses are done on publicly available expression array and spatial transcriptomic array datasets. For future studies, we therefore intend to combine spatial transcriptomic data with immunohistochemical analysis of the same tumors for validation purposes. We have started with setting up in vitro cocultures of neurons and ovarian cancer cells to obtain mechanistic insight in how genes with a large weight in TC121 regulate synaptic signaling and how that affects ovarian cancer cells.

      (2) Generalizability:

      The findings are primarily based on transcriptomic data from HGSOC. It remains unclear how these results apply to other subtypes of ovarian cancer or different cancer types.

      In Figure 5, we present the activity of TC121 across various cancer types, demonstrating broader applicability. However, due to limited treatment response data, we were unable to assess associations between TC activity scores and patient response. Additionally, transcriptomic and survival data specific to other ovarian cancer subtypes beyond HGSOC are currently not available, limiting our ability to generalize these findings to those groups. We intend to leverage survival data from TCGA to explore associations between TC activity scores and overall survival of patients with other cancer types. Nonetheless, we recognize limitations with TCGA survival data, as outlined in this article: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8726696/.

      (3) Innovative Methodology:

      Requires more validation using different platforms (IHC) to validate the performance of this bulk-derived data. Also, the lack of control over data quality is a concern.

      We acknowledge the reviewer’s suggestion to validate our results with alternative platforms, such as IHC; however, we regret that such validation is beyond the scope of this article. Regarding data quality control, we implemented a series of checks:

      • Bulk Transcriptional Profiles: We applied principal component analysis (PCA) on the sample Pearson product-moment correlation matrix, focusing on the first principal component (PCqc), which accounted for approximately 80-90% of the variance, primarily reflecting technical rather than biological variability  (Bhattacharya et al., 2020). Samples with a correlation below 0.8 with PCqc were removed as outliers. Additionally, we generated unique MD5 hashes for each CEL file to identify and exclude duplicate samples. Per gene, expression values were standardized to a mean of zero and a variance of one across the GEO, CCLE, GDSC, and TCGA datasets to minimize probeset- or gene-specific variability.

      • Spatial Transcriptional Profiles: We used PCA for quality control here as well, retained samples only if their loading factors for the first principal component showed consistent signs across all profiles (i.e., all profiles had either positive or negative loading factors for the first PC) from that individual spatial transcriptomic sample. Samples that did not meet this criterion were excluded from analyses.

      (4) Clinical Application:

      Although the study suggests potential drug targets, the translation of these findings into clinical practice is not addressed. Probably given the lack of some QA/QC procedures it'll be hard to translate these results. Future studies should focus on validating these targets in clinical settings.

      While this study is exploratory in nature, we agree that future studies should focus on validating these potential drug targets in clinical settings. As suggested, QA/QC procedures were integral to our analyses. We applied rigorous quality control, including PCA-based checks and duplicate removal across datasets, to ensure data integrity (detailed in our previous response).

      In terms of clinical application, which we partially discussed in the manuscript, we will discuss additional strategies to prevent synaptic signaling and neurotransmitter release in the tumor microenvironment (TME). Drugs such as ifenprodil and lamotrigine are used in treating neuronal disorders to block glutamate release responsible for subsequent synaptic signaling, whereas the vesicular monoamine transporter (VMAT) inhibitor reserpine can block the formation of synaptic vesicles (Reid et al., 2013; Williams et al., 2001). Previous in vitro studies with HGSOC cell lines showed a significant effect of ifenprodil alone on cancer cell proliferation, whereas reserpine seemed to trigger apoptosis in cancer cells (North et al., 2015; Ramamoorthy et al., 2019). Such strategies could potentially be used to inhibit synaptic neurotransmission in the TME.

      Reviewer #2 (Public review):

      Summary:

      Consensus-independent component analysis and closely related methods have previously been used to reveal components of transcriptomic data that are not captured by principal component or gene-gene coexpression analyses.

      Here, the authors asked whether applying consensus-independent component analysis (c-ICA) to published high-grade serous ovarian cancer (HGSOC) microarray-based transcriptomes would reveal subtle transcriptional patterns that are not captured by existing molecular omics classifications of HGSOC.

      Statistical associations of these (hitherto masked) transcriptional components with prognostic outcomes in HGSOC could lead to additional insights into underlying mechanisms and, coupled with corroborating evidence from spatial transcriptomics, are proposed for further investigation.

      This approach is complementary to existing transcriptomics classifications of HGSOC.

      The authors have previously applied the same approach in colorectal carcinoma (Knapen et al. (2024) Commun. Med).

      Strengths:

      (1) Overall, this study describes a solid data-driven description of c-ICA-derived transcriptional components that the authors identified in HGSOC microarray transcriptomics data, supported by detailed methods and supplementary documentation.

      We thank the reviewer for acknowledging the strength of our data-driven approach and the use of consensus-independent component analysis (c-ICA) to identify transcriptional components within HGSOC microarray data. We aimed to provide comprehensive methodological detail and supplementary documentation to support the reproducibility and robustness of our findings. We believe this approach allows for the identification of subtle transcriptional signals that might be overlooked by traditional analysis methods.

      (2) The biological interpretation of transcriptional components is convincing based on (data-driven) permutation analysis and a suite of analyses of association with copy-number, gene sets, and prognostic outcomes.

      We appreciate the reviewer’s positive feedback on the biological interpretation of our transcriptional components. We are pleased that our approach, which includes data-driven permutation testing and analyses of associations with copy-number alterations, gene sets, and prognostic outcomes, was found convincing. These analyses were integral to enhancing the robustness and biological relevance of our findings.

      (3) The resulting annotated transcriptional components have been made available in a searchable online format.

      Thank you for acknowledging the availability of our annotated transcriptional components in a searchable online format.

      (4) For the highlighted transcriptional component which has been annotated as related to synaptic signalling, the detection of the transcriptional component among 11 published spatial transcriptomics samples from ovarian cancers appears to support this preliminary finding and requires further mechanistic follow-up.

      Thank you for acknowledging the accessibility of our annotated transcriptional components. We prioritized making these data available in a searchable online format to facilitate further research and enable the community to explore and validate our findings.

      Weaknesses:

      (1) This study has not explicitly compared the c-ICA transcriptional components to the existing reported transcriptional landscape and classifications for ovarian cancers (e.g. Smith et al Nat Comms 2023; TCGA Nature 2011; Engqvist et al Sci Rep 2020) which would enable a further assessment of the additional contribution of c-ICA - whether the c-ICA approach captured entirely complementary components, or whether some components are correlated with the existing reported ovarian transcriptomic classifications.

      We appreciate the reviewer’s insightful suggestion to compare our c-ICA-derived transcriptional components with previously reported ovarian cancer classifications, such as those from Smith et al. (2023), TCGA (2011), and Engqvist et al. (2020). To address this, we will incorporate analyses comparing the activity scores of our transcriptional components with these published landscapes and classifications, particularly focusing on any associations with overall survival. Additionally, we plan to evaluate correlations between gene signatures from these studies and our identified TCs, enhancing our understanding of the unique contributions of the c-ICA approach.

      (2) Here, the authors primarily interpret the c-ICA transcriptional components as a deconvolution of bulk transcriptomics due to the presence of cells from tumour cells and the tumour microenvironment. However, c-ICA is not explicitly a deconvolution method with respect to cell types: the transcriptional components do not necessarily correspond to distinct cell types, and may reflect differential dysregulation within a cell type. This application of c-ICA for the purpose of data-driven deconvolution of cell populations is distinct from other deconvolution methods that explicitly use a prior cell signature matrix.

      Thank you for highlighting this nuanced aspect of c-ICA interpretation. We acknowledge that c-ICA, unlike traditional deconvolution methods, is not specifically designed for cell-type deconvolution and does not rely on a predefined cell signature matrix. While we explored the transcriptional components in the context of tumor and microenvironmental interactions, we agree that these components may not correspond directly to distinct cell types but rather reflect complex patterns of dysregulation, potentially within individual cell populations.

      Our goal with c-ICA was to uncover hidden transcriptional patterns possibly influenced by cellular heterogeneity. However, we recognize these patterns may also arise from regulatory processes within a single cell type. To investigate further, we plan to use single-cell transcriptional data (~60,000 cell-types annotated profiles from GSE158722) and project our transcriptional components onto these profiles to obtain activity scores, allowing us to assess each TC’s behavior across diverse cellular contexts after removing the first principal component to minimize background effects.

      References

      Allen JK, Armaiz-Pena GN, Nagaraja AS, Sadaoui NC, Ortiz T, Dood R, Ozcan M, Herder DM, Haemerrle M, Gharpure KM, Rupaimoole R, Previs R, Wu SY, Pradeep S, Xu X, Han HD, Zand B, Dalton HJ, Taylor M, Hu W, Bottsford-Miller J, Moreno-Smith M, Kang Y, Mangala LS, Rodriguez-Aguayo C, Sehgal V, Spaeth EL, Ram PT, Wong ST, Marini FC, Lopez-Berestein G, Cole SW, Lutgendorf SK, diBiasi M, Sood AK. 2018. Sustained adrenergic signaling promotes intratumoral innervation through BDNF induction. Cancer Res 78:canres.1701.2016.

      Balood M, Ahmadi M, Eichwald T, Ahmadi A, Majdoubi A, Roversi Karine, Roversi Katiane, Lucido CT, Restaino AC, Huang S, Ji L, Huang K-C, Semerena E, Thomas SC, Trevino AE, Merrison H, Parrin A, Doyle B, Vermeer DW, Spanos WC, Williamson CS, Seehus CR, Foster SL, Dai H, Shu CJ, Rangachari M, Thibodeau J, Rincon SVD, Drapkin R, Rafei M, Ghasemlou N, Vermeer PD, Woolf CJ, Talbot S. 2022. Nociceptor neurons affect cancer immunosurveillance. Nature 611:405–412.

      Bhattacharya A, Bense RD, Urzúa-Traslaviña CG, Vries EGE de, Vugt MATM van, Fehrmann RSN. 2020. Transcriptional effects of copy number alterations in a large set of human cancers. Nat Commun 11:715.

      Darragh LB, Nguyen A, Pham TT, Idlett-Ali S, Knitz MW, Gadwa J, Bukkapatnam S, Corbo S, Olimpo NA, Nguyen D, Court BV, Neupert B, Yu J, Ross RB, Corbisiero M, Abdelazeem KNM, Maroney SP, Galindo DC, Mukdad L, Saviola A, Joshi M, White R, Alhiyari Y, Samedi V, Bokhoven AV, John MSt, Karam SD. 2024. Sensory nerve release of CGRP increases tumor growth in HNSCC by suppressing TILs. Med 5:254-270.e8.

      Globig A-M, Zhao S, Roginsky J, Maltez VI, Guiza J, Avina-Ochoa N, Heeg M, Hoffmann FA, Chaudhary O, Wang J, Senturk G, Chen D, O’Connor C, Pfaff S, Germain RN, Schalper KA, Emu B, Kaech SM. 2023. The β1-adrenergic receptor links sympathetic nerves to T cell exhaustion. Nature 622:383–392.

      Jin M, Wang Y, Zhou T, Li W, Wen Q. 2022. Norepinephrine/β2-adrenergic receptor pathway promotes the cell proliferation and nerve growth factor production in triple-negative breast cancer. J Breast Cancer 26:268–285.

    2. eLife Assessment

      This valuable study uses consensus-independent component analysis to highlight transcriptional components (TC) in high-grade serous ovarian cancers (HGSOC). The study presents a convincing preliminary finding by identifying a TC linked to synaptic signaling that is associated with shorter overall survival in HGSOC patients, highlighting the potential role of neuronal interactions in the tumor microenvironment. This finding is corroborated by comparing spatially resolved transcriptomics in a small-scale study; a weakness is in being descriptive, non-mechanistic, and requiring experimental validation.

    3. Reviewer #1 (Public review):

      Summary:

      This manuscript explores the transcriptional landscape of high-grade serous ovarian cancer (HGSOC) using consensus-independent component analysis (c-ICA) to identify transcriptional components (TCs) associated with patient outcomes. The study analyzes 678 HGSOC transcriptomes, supplemented with 447 transcriptomes from other ovarian cancer types and noncancerous tissues. By identifying 374 TCs, the authors aim to uncover subtle transcriptional patterns that could serve as novel drug targets. Notably, a transcriptional component linked to synaptic signaling was associated with shorter overall survival (OS) in patients, suggesting a potential role for neuronal interactions in the tumor microenvironment. Given notable weaknesses like lack of validation cohort or validation using another platform (other than the 11 samples with ST), the data is considered highly descriptive and preliminary.

      Strengths:

      (1) Innovative Methodology:<br /> The use of c-ICA to dissect bulk transcriptomes into independent components is a novel approach that allows for the identification of subtle transcriptional patterns that may be overshadowed in traditional analyses.

      (2) Comprehensive Data Integration:<br /> The study integrates a large dataset from multiple public repositories, enhancing the robustness of the findings. The inclusion of spatially resolved transcriptomes adds a valuable dimension to the analysis.

      (3) Clinical Relevance:<br /> The identification of a synaptic signaling-related TC associated with poor prognosis highlights a potential new avenue for therapeutic intervention, emphasizing the role of the tumor microenvironment in cancer progression.

      Weaknesses:

      (1) Mechanistic Insights:<br /> While the study identifies TCs associated with survival, it provides limited mechanistic insights into how these components influence cancer progression. Further experimental validation is necessary to elucidate the underlying biological processes.

      (2) Generalizability:<br /> The findings are primarily based on transcriptomic data from HGSOC. It remains unclear how these results apply to other subtypes of ovarian cancer or different cancer types.

      (3) Innovative Methodology:<br /> Requires more validation using different platforms (IHC) to validate the performance of this bulk-derived data. Also, the lack of control over data quality is a concern.

      (4) Clinical Application:<br /> Although the study suggests potential drug targets, the translation of these findings into clinical practice is not addressed. Probably given the lack of some QA/QC procedures it'll be hard to translate these results. Future studies should focus on validating these targets in clinical settings.

    4. Reviewer #2 (Public review):

      Summary:

      Consensus-independent component analysis and closely related methods have previously been used to reveal components of transcriptomic data that are not captured by principal component or gene-gene coexpression analyses.

      Here, the authors asked whether applying consensus-independent component analysis (c-ICA) to published high-grade serous ovarian cancer (HGSOC) microarray-based transcriptomes would reveal subtle transcriptional patterns that are not captured by existing molecular omics classifications of HGSOC.

      Statistical associations of these (hitherto masked) transcriptional components with prognostic outcomes in HGSOC could lead to additional insights into underlying mechanisms and, coupled with corroborating evidence from spatial transcriptomics, are proposed for further investigation.

      This approach is complementary to existing transcriptomics classifications of HGSOC.

      The authors have previously applied the same approach in colorectal carcinoma (Knapen et al. (2024) Commun. Med).

      Strengths:

      Overall, this study describes a solid data-driven description of c-ICA-derived transcriptional components that the authors identified in HGSOC microarray transcriptomics data, supported by detailed methods and supplementary documentation.

      The biological interpretation of transcriptional components is convincing based on (data-driven) permutation analysis and a suite of analyses of association with copy-number, gene sets, and prognostic outcomes.

      The resulting annotated transcriptional components have been made available in a searchable online format.

      For the highlighted transcriptional component which has been annotated as related to synaptic signalling, the detection of the transcriptional component among 11 published spatial transcriptomics samples from ovarian cancers appears to support this preliminary finding and requires further mechanistic follow-up.

      Weaknesses:

      This study has not explicitly compared the c-ICA transcriptional components to the existing reported transcriptional landscape and classifications for ovarian cancers (e.g. Smith et al Nat Comms 2023; TCGA Nature 2011; Engqvist et al Sci Rep 2020) which would enable a further assessment of the additional contribution of c-ICA -- whether the cICA approach captured entirely complementary components, or whether some components are correlated with the existing reported ovarian transcriptomic classifications.

      Here, the authors primarily interpret the c-ICA transcriptional components as a deconvolution of bulk transcriptomics due to the presence of cells from tumour cells and the tumour microenvironment.

      However, c-ICA is not explicitly a deconvolution method with respect to cell types: the transcriptional components do not necessarily correspond to distinct cell types, and may reflect differential dysregulation within a cell type. This application of c-ICA for the purpose of data-driven deconvolution of cell populations is distinct from other deconvolution methods that explicitly use a prior cell signature matrix.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the present study, Chen et al. investigate the role of Endophilin A1 in regulating GABAergic synapse formation and function. To this end, the authors use constitutive or conditional knockout of Endophilin A1 (EEN1) to assess the consequences on GABAergic synapse composition and function, as well as the outcome for PTZ-induced seizure susceptibility. The authors show that EEN1 KO mice show a higher susceptibility to PTZ-induced seizures, accompanied by a reduction in the GABAergic synaptic scaffolding protein gephyrin as well as specific GABAAR subunits and eIPSCs. The authors then investigate the underlying mechanisms, demonstrating that Endophilin A1 binds directly to gephyrin and GABAAR subunits, and identifying the subdomains of Endophilin A1 that contribute to this effect. Overall, the authors state that their study places Endophilin A1 as a new regulator of GABAergic synapse function.

      Strengths:

      Overall, the topic of this manuscript is very timely, since there has been substantial recent interest in describing the mechanisms governing inhibitory synaptic transmission at GABAergic synapses. The study will therefore be of interest to a wide audience of neuroscientists studying synaptic transmission and its role in disease. The manuscript is well-written and contains a substantial quantity of data.

      Weaknesses:

      A number of questions remain to be answered in order to be able to fully evaluate the quality and conclusions of the study. In particular, a key concern throughout the manuscript regards the way that the number of samples for statistical analysis is defined, which may affect the validity of the data analysed. Addressing this weakness will be essential to providing conclusive results that support the authors' claims.

      We would like to thank the reviewer for appreciation of the value of our study and careful critics to help us improve the manuscript. We will correct the way that the number of samples for statistical analysis is defined throughout the manuscript as suggested and update figures, figure legends, and Materials and Methods accordingly. For example, we will average the values for all dendritic segments from one neuron, so that each data point represents one neuron in the graphs.

      Reviewer #2 (Public review):

      Summary:

      The function of neural circuits relies heavily on the balance of excitatory and inhibitory inputs. Particularly, inhibitory inputs are understudied when compared to their excitatory counterparts due to the diversity of inhibitory neurons, their synaptic molecular heterogeneity, and their elusive signature. Thus, insights into these aspects of inhibitory inputs can inform us largely on the functions of neural circuits and the brain.

      Endophilin A1, an endocytic protein heavily expressed in neurons, has been implicated in numerous pre- and postsynaptic functions, however largely at excitatory synapses. Thus, whether this crucial protein plays any role in inhibitory synapse, and whether this regulates functions at the synaptic, circuit, or brain level remains to be determined.

      New Findings:

      (1) Endophilin A1 interacts with the postsynaptic scaffolding protein gephyrin at inhibitory postsynaptic densities within excitatory neurons.

      (2) Endophilin A1 promotes the organization of the inhibitory postsynaptic density and the subsequent recruitment/stabilization of GABA A receptors via Endophilin A1's membrane binding and actin polymerization activities.

      (3) Loss of Endophilin A1 in CA1 mouse hippocampal pyramidal neurons weakens inhibitory input and leads to susceptibility to epilepsy.

      (4) Thus the authors propose that via its role as a component of the inhibitory postsynaptic density within excitatory neurons, Endophilin A1 supports the organization, stability, and efficacy of inhibitory input to maintain the excitatory/inhibitory balance critical for brain function.

      (5) The conclusion of the manuscript is well supported by the data but will be strengthened by addressing our list of concerns and experiment suggestions.

      We would like to thank the reviewer for their favorable impression of manuscript. We also appreciate the great experiment suggestions to help us improve the manuscript.

      Weaknesses:

      Technical concerns:

      (1) Figure 1F and Figure 1H, Figures 7H,J:

      Can the authors justify using a paired-pulse interval of 50 ms for eEPSCs and an interval of 200 ms for eIPSCs? Otherwise, experiments should be repeated using the same paired pulse interval.

      We apologize for the confusion. As illustrated by the schematic current traces, the decay time constants of eEPSCs and eIPSCs in hippocampal CA1 neurons are different. The eEPSCs exhibit a faster channel closing rate, corresponding to a smaller time constant Tau. Thus, a shorter inter-stimulus interval (50 ms) was chosen for paired-pulse ratio recordings. In contrast, the eIPSCs display a slower channel closing rate, with a Tau value larger than that of eEPSCs, so a longer inter-stimulus interval (200 ms) was used for PPR. This protocol has been long-established and adopted in previous studies (please see below for examples).

      Contractor, A., Swanson, G. & Heinemann, S. F. Kainate receptors are involved in short- and long-term plasticity at mossy fiber synapses in the hippocampus. Neuron 29, 209-216, doi:10.1016/s0896-6273(01)00191-x (2001).

      Babiec, W. E., Jami, S. A., Guglietta, R., Chen, P. B. & O'Dell, T. J. Differential Regulation of NMDA Receptor-Mediated Transmission by SK Channels Underlies Dorsal-Ventral Differences in Dynamics of Schaffer Collateral Synaptic Function. Journal of neuroscience 37, 1950-1964, doi:10.1523/JNEUROSCI.3196-16.2017 (2017).

      (2) Figures 3G,H,I:

      While 3D representations of proteins of interest bolster claims made by superresolution microscopy, SIM resolution is unreliable when deciphering the localization of proteins at the subsynaptic level given the small size of these structures (<1 micrometer). In order to determine the actual location of Endophilin A1, especially given the known presynaptic localization of this protein, the authors should complete SIM experiments with a presynaptic marker, perhaps an active zone protein, so that the relative localization of Endophilin A1 can be gleaned. Currently, overlapping signals could stem from the presynapse given the poor resolution of SIM in this context.

      Thanks for your suggestions. It is certainly preferable to investigate the relative localization of endophilin A1 using both presynaptic and postsynaptic markers. For SIM imaging in Figure 3G-I, to visualize neuronal morphology, we immunostained GFP as cell fill, leaving two other channels for detection of immunofluorescent signals of endophilin A1 and another protein. We will try co-immunostaining of endophilin A1, the active zone protein bassoon (presynaptic marker) and gephyrin without morphology labeling. Alternatively, we will do co-staining of endophilin A1 and bassoon in GFP-expressing neurons. We agree that overlapping signals or proximal localization of presynaptic endophilin A1 with gephyrin or GABAAR γ2 could not be ruled out. To note, if image resolution is improved with the use of a more advanced imaging system, the overlap between two proteins will become smaller or even disappear. With the ~110 nm lateral resolution of SIM microscopy, the degree of overlap between the two proteins of interest is much lower than in confocal microscopy. Given the presynaptic localization of endophilin, most likely we will observe a small overlap (presynatpic) or proximal localization (postsynaptic) of endophilin A1 with bassoon. Nevertheless, we will complete the SIM experiments as suggested to improve the manuscript.

      Manuscript consistency:

      (1) Figure 2:

      The authors looked at VGAT and noticed a reduction of signals in hippocampal regions in their P21 slices, indicating that the proposed postsynaptic organization/stabilization functions of Endophilin A1 extend to the inhibitory presynapse, perhaps via Neuroligin 2-Neurexin. Simultaneously, hippocampal regions in P21 slices showed a reduction in PSD-95 signals, indicating that excitatory synapses are also affected. It would be crucial to also look at excitatory presynapses, via VGLUT staining, to assess whether EndoA1 -/- also affects presynapses. Given the extensive roles of Endophilin A1 in presynapses, especially in excitatory presynapses, this should be investigated.

      Thanks for the thoughtful comments. Given that the both VGAT and PSD95 signals are reduced in hippocampal regions in P21 slices, it is conceivable that the proposed postsynaptic organization/stabilization functions of endophilin A1 extend to the inhibitory presynapse via Neuroligin-2-Neurexin and the excitatory presynapse as well during development. Of note, endophilin A1 knockout did not impair the distribution of Neuroligin-2 in inhibitory postsynapses (immunoisolated with anti-GABAAR α1) in mature mice (Figure 3K), and endophilin A1 did not bind to Neuroligin-2 (Figure 4D), suggesting that endophilin A1 might function via other mechanisms. Nevertheless, as functions of endophilin A family members at the presynaptic site are well-established, the reduction of presynaptic signals in developmental hippocampal regions of EndoA-/- mice might result from the depletion of presynaptic endophilin A1. The presynaptic deficits can be compensatory by other mechanisms as neurons mature. Certainly, we will do VGLUT staining of EndoA1-/- brain slices as suggested to assess the role of endophilin A1 in excitatory presynapses in vivo.

      (2) Figure 7C:

      The authors do not assess whether p140Cap overexpression rescues GABAAR receptor loss exhibited in Endophilin A1 KO, as they did for Gephryin. This would be an important data point to show, as p140Cap may somehow rescue receptor loss by another pathway. In fact, it is mentioned in the text that this experiment was done, "Consistently, neither p140Cap nor the endophilin A1 loss-of-function mutants could rescue the GABAAR clustering phenotype in EEN1 KO neurons (Figure 7C, D)" yet the data for p140Cap overexpression seem to be missing. This should be remedied.

      Thanks a lot for the thoughtful comment. We will determine whether p140Cap overexpression also rescues the GABAAR clustering phenotype in EndoA1-/- neurons by surface GABAAR γ2 staining in our revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      Chen et al. identify endophilin A1 as a novel component of the inhibitory postsynaptic scaffold. Their data show impaired evoked inhibitory synaptic transmission in CA1 neurons of mice lacking endophilin A1, and an increased susceptibility to seizures. Endophilin can interact with the postsynaptic scaffold protein gephyrin and promote assembly of the inhibitory postsynaptic element. Endophilin A1 is known to play a role in presynaptic terminals and in dendritic spines, but a role for endophilin A1 at inhibitory postsynaptic densities has not yet been described.

      Strengths:

      The authors used a broad array of experimental approaches to investigate this, including tests of seizure susceptibility, electrophysiology, biochemistry, neuronal culture, and image analysis.

      Weaknesses:

      Many results are difficult to interpret, and the data quality is not always convincing, unfortunately. The basic premise of the study, that gephyrin and endophilin A1 interact, requires a more robust analysis to be convincing.

      We greatly appreciate the positive comment on our study and the very valuable feedback for us to improve the manuscript. We will conduct additional experiments to improve our data quality and strengthen our evidences according to these great constructive suggestions. To gain strong evidence for the interaction between endophilin A1 and gephyrin, we will perform in vitro pull-down assay with recombinant proteins from bacterial expression system.

    2. eLife Assessment

      This study presents a valuable finding on the molecular mechanisms that govern GABAergic inhibitory synapse function. The authors propose that Endophilin A1 serves as a novel regulator of GABAergic synapses by acting as a component of the inhibitory postsynaptic density. Although the study employs a variety of approaches to address this question, the current analysis is incomplete and requires further experiments to substantiate the claims fully. The findings are likely to interest a broad audience of scientists focusing on inhibitory synaptic transmission, the excitation-inhibition balance, and its disruption in disorders such as epilepsy.

    3. Reviewer #1 (Public review):

      Summary:

      In the present study, Chen et al. investigate the role of Endophilin A1 in regulating GABAergic synapse formation and function. To this end, the authors use constitutive or conditional knockout of Endophilin A1 (EEN1) to assess the consequences on GABAergic synapse composition and function, as well as the outcome for PTZ-induced seizure susceptibility. The authors show that EEN1 KO mice show a higher susceptibility to PTZ-induced seizures, accompanied by a reduction in the GABAergic synaptic scaffolding protein gephyrin as well as specific GABAAR subunits and eIPSCs. The authors then investigate the underlying mechanisms, demonstrating that Endophilin A1 binds directly to gephyrin and GABAAR subunits, and identifying the subdomains of Endophilin A1 that contribute to this effect. Overall, the authors state that their study places Endophilin A1 as a new regulator of GABAergic synapse function.

      Strengths:

      Overall, the topic of this manuscript is very timely, since there has been substantial recent interest in describing the mechanisms governing inhibitory synaptic transmission at GABAergic synapses. The study will therefore be of interest to a wide audience of neuroscientists studying synaptic transmission and its role in disease. The manuscript is well-written and contains a substantial quantity of data.

      Weaknesses:

      A number of questions remain to be answered in order to be able to fully evaluate the quality and conclusions of the study. In particular, a key concern throughout the manuscript regards the way that the number of samples for statistical analysis is defined, which may affect the validity of the data analysed. Addressing this weakness will be essential to providing conclusive results that support the authors' claims.

    4. Reviewer #2 (Public review):

      Summary:

      The function of neural circuits relies heavily on the balance of excitatory and inhibitory inputs. Particularly, inhibitory inputs are understudied when compared to their excitatory counterparts due to the diversity of inhibitory neurons, their synaptic molecular heterogeneity, and their elusive signature. Thus, insights into these aspects of inhibitory inputs can inform us largely on the functions of neural circuits and the brain.

      Endophilin A1, an endocytic protein heavily expressed in neurons, has been implicated in numerous pre- and postsynaptic functions, however largely at excitatory synapses. Thus, whether this crucial protein plays any role in inhibitory synapse, and whether this regulates functions at the synaptic, circuit, or brain level remains to be determined.

      New Findings:

      (1) Endophilin A1 interacts with the postsynaptic scaffolding protein gephyrin at inhibitory postsynaptic densities within excitatory neurons.

      (2) Endophilin A1 promotes the organization of the inhibitory postsynaptic density and the subsequent recruitment/stabilization of GABA A receptors via Endophilin A1's membrane binding and actin polymerization activities.

      (3) Loss of Endophilin A1 in CA1 mouse hippocampal pyramidal neurons weakens inhibitory input and leads to susceptibility to epilepsy.

      (4) Thus the authors propose that via its role as a component of the inhibitory postsynaptic density within excitatory neurons, Endophilin A1 supports the organization, stability, and efficacy of inhibitory input to maintain the excitatory/inhibitory balance critical for brain function.

      (5) The conclusion of the manuscript is well supported by the data but will be strengthened by addressing our list of concerns and experiment suggestions.

      Weaknesses:

      Technical concerns:

      (1) Figure 1F and Figure 1H, Figures 7H,J:<br /> Can the authors justify using a paired-pulse interval of 50 ms for eEPSCs and an interval of 200 ms for eIPSCs? Otherwise, experiments should be repeated using the same paired pulse interval.

      (2) Figures 3G,H,I:<br /> While 3D representations of proteins of interest bolster claims made by superresolution microscopy, SIM resolution is unreliable when deciphering the localization of proteins at the subsynaptic level given the small size of these structures (<1 micrometer). In order to determine the actual location of Endophilin A1, especially given the known presynaptic localization of this protein, the authors should complete SIM experiments with a presynaptic marker, perhaps an active zone protein, so that the relative localization of Endophilin A1 can be gleaned. Currently, overlapping signals could stem from the presynapse given the poor resolution of SIM in this context.

      Manuscript consistency:

      (1) Figure 2:<br /> The authors looked at VGAT and noticed a reduction of signals in hippocampal regions in their P21 slices, indicating that the proposed postsynaptic organization/stabilization functions of Endophilin A1 extend to the inhibitory presynapse, perhaps via Neuroligin 2-Neurexin. Simultaneously, hippocampal regions in P21 slices showed a reduction in PSD-95 signals, indicating that excitatory synapses are also affected. It would be crucial to also look at excitatory presynapses, via VGLUT staining, to assess whether EndoA1 -/- also affects presynapses. Given the extensive roles of Endophilin A1 in presynapses, especially in excitatory presynapses, this should be investigated.

      (2) Figure 7C:<br /> The authors do not assess whether p140Cap overexpression rescues GABAAR receptor loss exhibited in Endophilin A1 KO, as they did for Gephryin. This would be an important data point to show, as p140Cap may somehow rescue receptor loss by another pathway. In fact, it is mentioned in the text that this experiment was done, "Consistently, neither p140Cap nor the endophilin A1 loss-of-function mutants could rescue the GABAAR clustering phenotype in EEN1 KO neurons (Figure 7C, D)" yet the data for p140Cap overexpression seem to be missing. This should be remedied.

    5. Reviewer #3 (Public review):

      Summary:

      Chen et al. identify endophilin A1 as a novel component of the inhibitory postsynaptic scaffold. Their data show impaired evoked inhibitory synaptic transmission in CA1 neurons of mice lacking endophilin A1, and an increased susceptibility to seizures. Endophilin can interact with the postsynaptic scaffold protein gephyrin and promote assembly of the inhibitory postsynaptic element. Endophilin A1 is known to play a role in presynaptic terminals and in dendritic spines, but a role for endophilin A1 at inhibitory postsynaptic densities has not yet been described.

      Strengths:

      The authors used a broad array of experimental approaches to investigate this, including tests of seizure susceptibility, electrophysiology, biochemistry, neuronal culture, and image analysis.

      Weaknesses:

      Many results are difficult to interpret, and the data quality is not always convincing, unfortunately. The basic premise of the study, that gephyrin and endophilin A1 interact, requires a more robust analysis to be convincing.

    1. eLife Assessment

      Computational simulation of neuron function depends on a collection of morphological properties and ion channel biophysics. This manuscript introduces DendroTweaks, a useful web application and Python library that, compared to existing modeling tools, eases interactive graphical exploration, development, and validation of single-neuron models. The authors provide a convincing demonstration that their software aids with building intuition and rapid prototyping of biophysical models of neurons, which improves the accessibility of dendritic simulation.

    2. Reviewer #1 (Public review):

      Summary:

      Dendrotweaks provides its users with a solid tool to implement, visualize, tune, validate, understand, and reduce single-neuron models that incorporate complex dendritic arbors with differential distribution of biophysical mechanisms. The visualization of dendritic segments and biophysical mechanisms therein provide users with an intuitive way to understand and appreciate dendritic physiology.

      Strengths:

      (1) The visualization tools are simplified, elegant, and intuitive.

      (2) The ability to build single-neuron models using simple and intuitive interfaces.

      (3) The ability to validate models with different measurements.

      (4) The ability to systematically and progressively reduce morphologically-realistic neuronal models.

      Weaknesses:

      (1) Inability to account for neuron-to-neuron variability in structural, biophysical, and physiological properties in the model-building and validation processes.

      (2) Inability to account for the many-to-many mapping between ion channels and physiological outcomes. Reliance on hand-tuning provides a single biased model that does not respect pronounced neuron-to-neuron variability observed in electrophysiological measurements.

      (3) Lack of a demonstration on how to connect reduced models into a network within the toolbox.

      (4) Lack of a set of tutorials, which is common across many "Tools and Resources" papers, that would be helpful in users getting acquainted with the toolbox.

    3. Reviewer #2 (Public review):

      The paper by Makarov et al. describes the software tool called DendroTweaks, intended for the examination of multi-compartmental biophysically detailed neuron models. It offers extensive capabilities for working with very complex distributed biophysical neuronal models and should be a useful addition to the growing ecosystem of tools for neuronal modeling.

      Strengths

      (1) This Python-based tool allows for visualization of a neuronal model's compartments.

      (2) The tool works with morphology reconstructions in the widely used .swc and .asc formats.

      (3) It can support many neuronal models using the NMODL language, which is widely used for neuronal modeling.

      (4) It permits one to plot the properties of linear and non-linear conductances in every compartment of a neuronal model, facilitating examination of the model's details.

      (5) DendroTweaks supports manipulation of the model parameters and morphological details, which is important for the exploration of the relations of the model composition and parameters with its electrophysiological activity.

      (6) The paper is very well written - everything is clear, and the capabilities of the tool are described and illustrated with great attention to detail.

      Weaknesses

      (1) Not a really big weakness, but it would be really helpful if the authors showed how the performance of their tool scales. This can be done for an increasing number of compartments - how long does it take to carry out typical procedures in DendroTweaks, on a given hardware, for a cell model with 100 compartments, 200, 300, and so on? This information will be quite useful to understand the applicability of the software.

      (2) Let me also add here a few suggestions (not weaknesses, but something that can be useful, and if the authors can easily add some of these for publication, that would strongly increase the value of the paper).

      (3) It would be very helpful to add functionality to read major formats in the field, such as NeuroML and SONATA.

      (4) Visualization is available as a static 2D projection of the cell's morphology. It would be nice to implement 3D interactive visualization.

      (5) It is nice that DendroTweaks can modify the models, such as revising the radii of the morphological segments or ionic conductances. It would be really useful then to have the functionality for writing the resulting models into files for subsequent reuse.

      (6) If I didn't miss something, it seems that DendroTweaks supports the allocation of groups of synapses, where all synapses in a group receive the same type of Poisson spike train. It would be very useful to provide more flexibility. One option is to leverage the SONATA format, which has ample functionality for specifying such diverse inputs.

      (7) "Each session can be saved as a .json file and reuploaded when needed" - do these files contain the whole history of the session or the exact snapshot of what is visualized when the file is saved? If the latter, which variables are saved, and which are not? Please clarify.

    4. Author response:

      Public Reviews:

      Summary:

      We sincerely thank the reviewers for their insightful and thorough feedback. Their comments cover both technical and conceptual aspects of our project, which we have attempted to address in our provisional responses.

      First, we would like to clarify that any current lack of documentation or technical issues (such as local installation challenges) reflect the software's early stage. These aspects are receiving our full attention and are not intended to remain in their current state. As suggested, we plan to enhance the toolbox’s structure by separating it into a standalone library and a web application, alongside developing smaller satellite apps for SWC and MOD file management. We will also expand our documentation, provide a more detailed user guide, and add video tutorials for the GUI.

      Second, we have clarified the rationale behind specific implementation choices in our software, explaining why certain features of the toolbox were designed and implemented in particular ways. Our goal is to maintain a strong focus on single-cell level modeling, addressing its various aspects in great detail. We are also working on new features, such as automated parameter optimization and support for multiple output formats, to further enrich the toolbox’s functionality.

      Reviewer #1 (Public review):

      Summary:

      Dendrotweaks provides its users with a solid tool to implement, visualize, tune, validate, understand, and reduce single-neuron models that incorporate complex dendritic arbors with differential distribution of biophysical mechanisms. The visualization of dendritic segments and biophysical mechanisms therein provide users with an intuitive way to understand and appreciate dendritic physiology.

      Strengths:

      (1) The visualization tools are simplified, elegant, and intuitive.

      (2) The ability to build single-neuron models using simple and intuitive interfaces.

      (3) The ability to validate models with different measurements.

      (4) The ability to systematically and progressively reduce morphologically-realistic neuronal models.

      We thank the reviewer for their positive comments.

      Weaknesses:

      (1) Inability to account for neuron-to-neuron variability in structural, biophysical, and physiological properties in the model-building and validation processes.

      We agree with the reviewer that it is important to account for neuron-to-neuron variability. The core approach of DendroTweaks and its distinctive feature is interactive exploration of how morpho-electric parameters affect neuronal activity. In light of this, variability can be achieved through interactive updating of the model parameters with widgets. In a sense, by adjusting a widget (e.g., channel distribution or kinetics), a user ends up with a new instance of a cell in the parameter space and receives almost real-time feedback on how this change affects neuronal activity. Implementing complex algorithms to account for neuron-to-neuron variability during the validation process would detract from the interactivity aspect of the GUI. That being said, we acknowledge the importance of this issue and we will explore the options to address it more comprehensively in our revised manuscript.

      (2) Inability to account for the many-to-many mapping between ion channels and physiological outcomes. Reliance on hand-tuning provides a single biased model that does not respect pronounced neuron-to-neuron variability observed in electrophysiological measurements.

      We acknowledge the challenge of accounting for degeneracy in the relation between ion channels and physiological outcomes and the importance of capturing neuron-to-neuron variability. One possible way to address this, as we mention in the Discussion, is to integrate automated parameter optimization algorithms alongside the existing interactive hand-tuning with widgets. We are currently exploring the possibility of integrating Jaxley (Deistler et al., 2024) into DendroTweaks in addition to NEURON. This would allow for automated and fast gradient-based parameter optimization, including optimization of heterogeneous channel distributions.

      (3) Lack of a demonstration on how to connect reduced models into a network within the toolbox.

      Building a network of reduced models is a promising direction, albeit it goes beyond the scope of this manuscript. We do not plan to add support for network models to the toolbox itself. In DendroTweaks, we focus on single-cell modeling, aiming to cover its various aspects in great detail. Of course, such refined single-cell models—both detailed and reduced—are likely to be integrated into networks but this will not take place within the DendroTweaks toolbox. To support the integration of DendroTweaks-produced model neurons into networks, we will focus on better compatibility with existing formats and standards and improve exporting capabilities. It is already possible to export reduced morphologies as SWC files, standardized ion channel models as MOD files and channel distributions as JSON files. Nevertheless, as a proof of concept, we plan to generate a simple network of exported reduced models outside the toolbox and include it as a separate Jupyter notebook.

      (4) Lack of a set of tutorials, which is common across many "Tools and Resources" papers, that would be helpful in users getting acquainted with the toolbox.

      This is a valid concern that we aim to address promptly. Currently, an online user guide is available at https://dendrotweaks.dendrites.gr/guide.html. This guide introduces users to the GUI elements and covers basic use cases. We are working on video tutorials and detailed documentation, which will be available soon (as part of the revised manuscript). The toolbox will be split into two parts: a Bokeh app and a standalone library. The library will offer the core functionality, such as reducing morphology and standardizing channels, without the GUI, enabling bulk processing. It will be installable through PyPI and integrated into the app code as an external library. We will provide thorough documentation for all classes and functions in the library.

      Reviewer #2 (Public review):

      The paper by Makarov et al. describes the software tool called DendroTweaks, intended for the examination of multi-compartmental biophysically detailed neuron models. It offers extensive capabilities for working with very complex distributed biophysical neuronal models and should be a useful addition to the growing ecosystem of tools for neuronal modeling.

      Strengths

      (1) This Python-based tool allows for visualization of a neuronal model's compartments.

      (2) The tool works with morphology reconstructions in the widely used .swc and .asc formats.

      (3) It can support many neuronal models using the NMODL language, which is widely used for neuronal modeling.

      (4) It permits one to plot the properties of linear and non-linear conductances in every compartment of a neuronal model, facilitating examination of the model's details.

      (5) DendroTweaks supports manipulation of the model parameters and morphological details, which is important for the exploration of the relations of the model composition and parameters with its electrophysiological activity.

      (6) The paper is very well written - everything is clear, and the capabilities of the tool are described and illustrated with great attention to detail.

      We thank the reviewer for their positive comments.

      Weaknesses

      (1) Not a really big weakness, but it would be really helpful if the authors showed how the performance of their tool scales. This can be done for an increasing number of compartments - how long does it take to carry out typical procedures in DendroTweaks, on a given hardware, for a cell model with 100 compartments, 200, 300, and so on? This information will be quite useful to understand the applicability of the software.

      DendroTweaks functions as a layer on top of a simulation engine. As a result, currently its performance scales in proportion to the NEURON’s one. Note that the GUI displays the time taken to run a given simulation in NEURON at the bottom of the Simulation tab in the left menu. While GUI-related processing and rendering also consume time, this is not as straightforward to measure. Nonetheless, we will explore options to provide suggested benchmarking in the revised manuscript.

      (2) Let me also add here a few suggestions (not weaknesses, but something that can be useful, and if the authors can easily add some of these for publication, that would strongly increase the value of the paper).

      (3) It would be very helpful to add functionality to read major formats in the field, such as NeuroML and SONATA.

      We agree with the reviewer that support for major formats will substantially improve and ensure reproducibility and reusability of the models. As mentioned in the Discussion, we plan to add support for NeuroML. Regarding SONATA, it is indeed possible to view our models as a network with a single morphologically-detailed biophysical node receiving inputs from multiple populations of virtual nodes. In future editions of the tool we plan to expand its support for additional file formats.

      (4) Visualization is available as a static 2D projection of the cell's morphology. It would be nice to implement 3D interactive visualization.

      We offer an option to rotate a cell around the vertical axis using a slider under the plot. This is a workaround, as implementing a true 3D visualization in Bokeh would require custom Bokeh elements, along with external JavaScript libraries. Despite these implementation difficulties, we advocate for a different approach than the one used in most of the morphology viewers mentioned in the Discussion. The core idea of DendroTweaks' morphology exploration is that each section is "clickable" allowing its geometric properties to be examined in a 2D Section view. Furthermore, we believe the Graph view presents the overall cell topology more clearly than a 3D visualization.

      (5) It is nice that DendroTweaks can modify the models, such as revising the radii of the morphological segments or ionic conductances. It would be really useful then to have the functionality for writing the resulting models into files for subsequent reuse.

      This functionality is already available. Users can export JSON files with channel distributions and SWC files after morphology reduction through the GUI. In the standalone version, users can modify and export SWC files, as well as export MOD files after standardization. Please note that in the online demo version export and import functionality is currently limited, but we plan to fully enable it when submitting our revisions. We are considering separating file managers as satellite apps—one for SWC and one for MOD files. It is worth mentioning that the MOD file manager along with parsing the files and generating Python classes for visualization purposes is already capable of producing Jaxley-compatible Python channel classes.

      (6) If I didn't miss something, it seems that DendroTweaks supports the allocation of groups of synapses, where all synapses in a group receive the same type of Poisson spike train. It would be very useful to provide more flexibility. One option is to leverage the SONATA format, which has ample functionality for specifying such diverse inputs.

      Currently, each group shares the same set of parameters for both biophysical properties of synapses (e.g., reversal potential, time constants) and presynaptic "population" activity (e.g., rate, onset). The parameter that controls an incoming Poisson spike train is the rate, which is indeed shared across all synapses in a group. The suggestion to allow for variability in input properties within a group is interesting and is worth implementing. We will explore this in the revised manuscript.

      (7) "Each session can be saved as a .json file and reuploaded when needed" - do these files contain the whole history of the session or the exact snapshot of what is visualized when the file is saved? If the latter, which variables are saved, and which are not? Please clarify.

      These files capture the exact snapshot of the model's latest state. They include model parameters such as channel distributions, equilibrium potentials, and temperature. Currently, stimuli (current clamps and synapses) are not saved. However, we plan to add an option to export stimuli parameters in the same JSON file. This will also be available as part of the revised manuscript.

      References

      Michael Deistler, Kyra L. Kadhim, Matthijs Pals, Jonas Beck, Ziwei Huang, Manuel Gloeckler, Janne K. Lappalainen, Cornelius Schröder, Philipp Berens, Pedro J. Gonçalves, Jakob H. Macke Differentiable simulation enables large-scale training of detailed biophysical models of neural dynamics bioRxiv 2024.08.21.608979; doi:https://doi.org/10.1101/2024.08.21.608979

    1. eLife Assessment

      This important study addresses the question of how large-scale events such as the COVID-19 pandemic can change peoples' beliefs and their updates. Using a well-validated task, the authors find that belief updating becomes less optimistically biased during COVID-19, compared to prior to it. While they have solid evidence for specific changes of positive learning rates in their RL model and should be commended for more extensive modeling of the task than many previous studies, comprehensive model confusion and recovery analyses could further strengthen their claims by clarifying the specificity of their findings. As with many manipulations outside the experimenters' control, it remains unclear which psychological factor impacted by the pandemic drives the group differences, and sample sizes are by necessity on the smaller side as data cannot readily be acquired.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript uses a well-validated behavioral estimation task to investigate the degree to which optimistic belief updating was attenuated during the 2020 global pandemic. Online participants recruited during and outside of the pandemic estimated how likely different negative life events were to happen to them in the future and were given statistics about these events happening. Belief updating (measured as the degree to which estimations changed after viewing the statistics) was less optimistically biased during the pandemic (compared to outside of it). This resulted from reduced updating from "good news" (better than expected information). Computational models were used to try to unpack how statistics were integrated and used to revise beliefs. Two families of models were compared - an RL set of models where "estimation errors" (analogous to prediction errors in classic RL models) predict belief change and a Bayesian set of models where an implied likelihood ratio was calculated (derived from participants estimations of their own risk and estimation of the base rate risk) and used to predict belief change. The authors found evidence that the former set of models accounted for updating better outside of the pandemic, but the latter accounted for updating during the pandemic. In addition, the RL model provides evidence that learning was asymmetrically positively biased outside of the pandemic but symmetric during it (as a result of reduced learning rates from good news estimation errors).

      Strengths:

      Understanding whether biases in learning are fixed modes of information processing or flexible and adapt in response to environmental shocks (like a global pandemic or economic recession) is an important area of research relevant to a wide range of fields, including cognitive psychology, behavioral economics, and computational psychiatry. The study uses a well-validated task, and the authors conduct a power analysis to show that the sample sizes are appropriate. Furthermore, the authors test that their results hold in both a between-group analysis (the focus of the main paper) and a within-group analysis (mainly in the supplemental).

      The finding that optimistic biases are reduced in response to acute stress, perceived threat, and depression has been shown before using this task both in the lab (social stress manipulation), in the real world (firefighters on duty), and clinical groups (patients with depression). However, the work does extend these findings here in important ways:

      (1) Examining the effect of a new real-world adverse event (the pandemic).<br /> (2) The reduction in optimistic updating here arises due to reduced updating from positive information (previously, in the case of environmental threat, this reduction mainly arose from increased sensitivity to negative information).<br /> (3) Leveraging new RL-inspired computational approaches, demonstrating that the bias - and its attenuation - can be captured using trial-by-trial computational modeling with separate learning rates for positive and negative estimation errors.

      Weaknesses:

      Some interpretation and analysis (the computational modeling in particular) could be improved.

      On the interpretation side, while the pandemic was an adverse experience and stressful for many people (including myself), the absence of any measures of stress/threat levels limits the conclusions one can draw. Past work that has used this task to examine belief updating in response to adverse environmental events took physiological (e.g., SCR, cortisol) and/or self-report (questionnaires) measures of mood. In SI Table 1, the authors possibly had some questionnaire measures along these lines, but this might be for the participants tested during the pandemic.

      On the analysis side, it was unclear what the motivation was for the different sets of models tested. Both families of models test asymmetric vs symmetric learning (which is the main question here) and have similar parameters (scaling and asymmetry parameters) to quantify these different aspects of the learning process. Conceptually, the different behavioral patterns one could expect from the two families of models needed to be clarified. Do the "winning" models produce the main behavioral patterns in Figure 1, and are they in some way uniquely able to do so, for instance? How would updating look different for an optimistic RL learner versus an optimistic Bayesian RL learner? Would the asymmetry parameter in the former be correlated with the asymmetry parameter in the latter? Moreover, crucially, would one be able to reliably distinguish the models from one another under the model estimation and selection criteria that the authors have used here (presenting robust model recovery could help to show this)?

    3. Reviewer #2 (Public review):

      The authors investigated how experiencing the COVID-19 pandemic affected optimism bias in updating beliefs about the future. They ran a between-subjects design testing for participants on cognitive tasks before, during, and after lifting the sanitary state of emergence during the pandemic. The authors show that optimism bias varied depending on the context in which it was tested. Namely, it disappeared during COVID-19 and re-emerged at the time of lift of sanitary emergency measures. Through advanced computational modeling, they are able to thoroughly characterize the nature of such alternations, pinpointing specific mechanisms underlying the lack of optimistic bias during the pandemic.

      Strengths pertain to the comprehensive assessment of the results via computational modeling and from a theoretical point of view to the notion that environmental factors can affect cognition. However, the relatively small sample size for each group is a limitation. A major impediment interpreting of the findings is the need for additional measures. While the information on for example, risk perception or the need for social interaction was collected from participants during the pandemic, the fact that these could not be included in the analysis hinders the interpretation of findings, which is now generally based on data collected during the pandemic, for example, reporting increased stress. While authors suggest an interpretation in terms of uncertainty of real-life conditions it is currently difficult to know if that factor drove the effect. Many concurrent elements might have accounted for the findings. This limits understanding of the underlying mechanisms related to changes in optimism bias

    4. Author response:

      To reviewer #1:

      We appreciate your advice on providing more conceptual motivations for comparing Bayesian and RL-like belief updating models. In short, both model families are complementary in capturing asymmetrical and symmetrical updating. They both consider that the magnitude of updating is weighed by two separate learning rates, one for positive and one for negative belief disconfirming evidence. If these two learning rates differ, updating is asymmetrical; if they are equal, updating is symmetrical.

      However, the model families’ assumptions about the underlying updating process differ. In the RL-like belief updating model family, this process is assumed to be driven by comparing base rates and initial beliefs, also known as the prediction error (PE), weighed by the learning rates. On the contrary, the Bayesian updating model assumes that updating (i.e., the posterior belief) is driven by combining the base rate (i.e., the prior evidence) and how often the initial belief is represented in the estimated base rate (i.e., the likelihood ratio of all other alternative hypotheses, beliefs). Moreover, the two components of the posterior belief can differ in their respective contribution (i.e., precision or confidence), which might be more adaptive to external actual life conditions characterized by high uncertainty about the future.

      For the revised manuscript, we will elaborate more on the conceptual and psychological meaning of these two proposed belief updating processes. So far, it is important to note that we do not have direct proof of humans reasoning in an RL-like or Bayesian way when updating their beliefs about the future. We, therefore, focus on the complementarity of both models to capture latent processes and variables in belief updating that can be leveraged to understand the sources of inter-individual differences and the impact of external contexts such as experiencing an actual adverse life event on human psychology.

      To reviewer #2:

      Thank you for recommending the exploration of potential differences between optimism biases in initial belief estimations (self versus other) during and outside the pandemic. We will also provide more details on the belief updating task and design.

      To both reviewers: 

      We agree on the limitations arising from the lack of physiological and self-reported measures of stress. We collected some self-reports on risk perception, adoption of protective measures, need for social interactions, and mood, but solely in participants tested during the pandemic-related lockdowns (reported in the SI Table 1). For the revised manuscript, we propose exploring the correlational links between belief-updating biases and self-reports in this sample. The expected outcomes of such correlational analyses may identify the variables to target with interventions in future studies of human belief updating under real-world contexts. We also will add a relevant section to the discussion to elaborate on the limitation that hinders inferring plausible psychological causes of the differences observed in belief updating during and outside the pandemic.

      Importantly, we will follow your recommendations to improve the computational modeling analyses. We will (1) add the confusion matrices from model recovery analyses to gain inferences on specificity, (2) provide evidence for the best-fitting model to reproduce the observed behavior shown in Figure 1, and (3) conduct model comparisons on the combined groups to justify the focus on the RL like updating model. In a few weeks, we plan to submit a revised manuscript alongside a point-by-point response to your concerns and recommendations.

    1. eLife Assessment

      The study presents important findings on inositol-requiring enzyme (IRE1α) inhibition on diet-induced obesity (overnutrition) and insulin resistance where IRE1α inhibition enhances thermogenesis and reduces the metabolically active and M1-like macrophages in adipose tissue. The evidence supporting the conclusions is convincing but can be enhanced with information/data on the validity, specificity, selectivity, and toxicity of the IRE1α inhibitor and supported with more detail on the mechanisms by which adipose tissue macrophages influence adipocyte metabolism. The work will be of interest to cell biologists and biochemists working in metabolism, insulin resistance, and inflammation.

    2. Reviewer #1 (Public review):

      First, the authors confirm the up-regulation of the main genes involved in the three branches of the Unfolded Protein Response (UPR) system in diet-induced obese mice in AT, observations that have been extensively reported before. Not surprisingly, IRE1a inhibition with STF led to an amelioration of the obesity and insulin resistance of the animals. Moreover, non-alcoholic fatty liver disease was also improved by the treatment. More novel are their results in terms of thermogenesis and energy expenditure, where IRE1a seems to act via activation of brown AT. Finally, mice treated with STF exhibited significantly fewer metabolically active and M1-like macrophages in the AT compared to those under vehicle conditions. Overall, the authors conclude that targeting IRE1a has therapeutical potential for treating obesity and insulin resistance.

      The study has some strengths, such as the detailed characterization of the effect of STF in different fat depots and a thorough analysis of macrophage populations. However, the lack of novelty in the findings somewhat limits the study´s impact on the field.

    3. Reviewer #2 (Public review):

      The manuscript by Wu et al demonstrated that IRE1a inhibition mitigated insulin resistance and other comorbidities through increased energy expenditure in DIO mice. In this reviewer's opinion, this timely study has high significance in the field of metabolism research for the following reasons.

      (1) The authors' findings are significant and may offer a new therapeutic target to treat metabolic diseases, including diabetes, obesity, NAFLD, etc.

      (2) The authors carefully profiled the ATMs and examined the changes in gene expression after STF treatment.

      (3) The authors presented evidence collected from both systemic indirect calorimetry and individual tissue gene expression to support the notion of increased energy expenditure.

      Overall, the authors have presented sufficient background in a clear and logically organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings, and made a justified conclusion.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Wu D. et al. explores an innovative approach to immunometabolism and obesity by investigating the potential of targeting macrophage Inositol-requiring enzyme 1α (IRE1α) in cases of overnutrition. Their findings suggest that pharmacological inhibition of IRE1α could influence key aspects such as adipose tissue inflammation, insulin resistance, and thermogenesis. Notable discoveries include the identification of High-Fat Diet (HFD)-induced CD9+ Trem2+ macrophages and the reversal of metabolically active macrophages' activity with IRE1α inhibition using STF. These insights could significantly impact future obesity treatments.

      Strengths:

      The study's key strengths lie in its identification of specific macrophage subsets and the demonstration that inhibiting IRE1α can reverse the activity of these macrophages. This provides a potential new avenue for developing obesity treatments and contributes valuable knowledge to the field.

      Weaknesses:

      The research lacks an in-depth exploration of the broader metabolic mechanisms involved in controlling diet-induced obesity (DIO). Addressing this gap would strengthen the understanding of how targeting IRE1α might fit into the larger metabolic landscape.

      Impact and Utility:

      The findings have the potential to advance the field of obesity treatment by offering a novel target for intervention. However, further research is needed to fully elucidate the metabolic pathways involved and to confirm the long-term efficacy and safety of this approach. The methods and data presented are useful, but additional context and exploration are required for broader application and understanding.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      First, the authors confirm the up-regulation of the main genes involved in the three branches of the Unfolded Protein Response (UPR) system in diet-induced obese mice in AT, observations that have been extensively reported before. Not surprisingly, IRE1a inhibition with STF led to an amelioration of the obesity and insulin resistance of the animals. Moreover, non-alcoholic fatty liver disease was also improved by the treatment. More novel are their results in terms of thermogenesis and energy expenditure, where IRE1a seems to act via activation of brown AT. Finally, mice treated with STF exhibited significantly fewer metabolically active and M1-like macrophages in the AT compared to those under vehicle conditions. Overall, the authors conclude that targeting IRE1a has therapeutical potential for treating obesity and insulin resistance.

      The study has some strengths, such as the detailed characterization of the effect of STF in different fat depots and a thorough analysis of macrophage populations. However, the lack of novelty in the findings somewhat limits the study´s impact on the field.

      We thank the reviewer for the appreciation of our findings and the comments about the novelty. Regarding the novelty, we would emphasize several novelties presented in this manuscript. First, as the reviewer correctly pointed out, we discovered that IRE1 inhibition by STF activates brown AT and promotes thermogenesis and that IRE1 inhibition not only significantly attenuated the newly discovered CD9+ ATMs and the “M1-like” CD11c+ ATMs but also diminished the M2 ATMs for the first time. These discoveries are very important and novel. In obesity, it was originally proposed that ATM undergoes M1/M2 polarization from an anti-inflammatory M2 to a classical pro-inflammatory M1 state. It was further reported that IRE1 deletion improves thermogenesis by boosting M2 population which then synthesize and secrete catecholamines to promote thermogenesis. It is now known that M2 macrophages do not synthesize catecholamines or promote thermogenesis. In this study, we discovered that IRE1 inhibition doesn’t increase (but instead decrease) the M2 population and that IRE1 inhibition promotes thermogenesis likely by suppressing pro-inflammatory macrophage populations including the M1-like ATMs and most importantly the newly identified metabolically active macrophages, given that ATM inflammation has been reported to suppress thermogenesis. Second, this study presented the first characterization of relationship between the more classical M1-like ATMs and the newly discovered metabolically active ATMs, showing that the CD11c+ M1-like ATMs are largely overlapping with but yet non-identical to CD9+ ATMs in the eWAT under HFD. Third, although upregulation of ER stress response genes in the adipose tissues of diet-induced obese mice have been extensively reported, it doesn’t necessarily mean that targeting IRE1a or ER stress can reverse existing insulin resistance and obesity. It is not uncommon that a therapy doesn’t yield the desired effect as expected. For instance, amyloid plaques are a hallmark of Alzheimer's disease (AD), interventions that prevent or reverse beta amyloid deposition have been expected to prevent progression or even reverse cognitive impairment in AD patients. However, clinical trials on such therapies have been disappointing. In essence, experimental demonstration of effectiveness or feasibility for any potential therapeutic targets is a first step for any future clinical implementation.

      Reviewer #2 (Public review):

      The manuscript by Wu et al demonstrated that IRE1a inhibition mitigated insulin resistance and other comorbidities through increased energy expenditure in DIO mice. In this reviewer's opinion, this timely study has high significance in the field of metabolism research for the following reasons.

      (1) The authors' findings are significant and may offer a new therapeutic target to treat metabolic diseases, including diabetes, obesity, NAFLD, etc.

      (2) The authors carefully profiled the ATMs and examined the changes in gene expression after STF treatment.

      (3) The authors presented evidence collected from both systemic indirect calorimetry and individual tissue gene expression to support the notion of increased energy expenditure.

      Overall, the authors have presented sufficient background in a clear and logically organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings, and made a justified conclusion.

      We thank the reviewer for the appreciation of our work.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Wu D. et al. explores an innovative approach to immunometabolism and obesity by investigating the potential of targeting macrophage Inositol-requiring enzyme 1α (IRE1α) in cases of overnutrition. Their findings suggest that pharmacological inhibition of IRE1α could influence key aspects such as adipose tissue inflammation, insulin resistance, and thermogenesis. Notable discoveries include the identification of High-Fat Diet (HFD)-induced CD9+ Trem2+ macrophages and the reversal of metabolically active macrophages' activity with IRE1α inhibition using STF. These insights could significantly impact future obesity treatments.

      Strengths:

      The study's key strengths lie in its identification of specific macrophage subsets and the demonstration that inhibiting IRE1α can reverse the activity of these macrophages. This provides a potential new avenue for developing obesity treatments and contributes valuable knowledge to the field.

      Weaknesses:

      The research lacks an in-depth exploration of the broader metabolic mechanisms involved in controlling diet-induced obesity (DIO). Addressing this gap would strengthen the understanding of how targeting IRE1α might fit into the larger metabolic landscape.

      Impact and Utility:

      The findings have the potential to advance the field of obesity treatment by offering a novel target for intervention. However, further research is needed to fully elucidate the metabolic pathways involved and to confirm the long-term efficacy and safety of this approach. The methods and data presented are useful, but additional context and exploration are required for broader application and understanding.

      We thank the reviewer for the appreciation of strengths in our manuscript. In particular, we appreciate the reviewer’s recommendation on the exploration of broader metabolic landscape, such as the effect of IRE1 inhibition on non-adipose tissue macrophages and metabolism. We agree that achieving these will certainly broaden the therapeutic potential of IRE1 inhibition to larger metabolic disorders and we will pursue these explorations in future studies.

    1. eLife Assessment

      This study provides important insights into the question of how interacting brain areas produce behaviour during the execution of a skilled multi-directional reaching task. Using a combination of single neuron and neural population analysis, as well as optogenetic stimulation and computational models, the authors provide solid evidence of an asymmetrical influence between mouse premotor and motor cortex during the execution of a well-practiced behaviour. This asymmetry can only be captured by some but not all population analysis methods, which is a key lesson to the field in and of itself. Analyzing how activity that is shared and private to these areas relates to different aspects of movements, and linking the model predictions to the actual data, would further strengthen this work.

    2. Reviewer #1 (Public review):

      This study examined the interaction between two key cortical regions in the mouse brain involved in goal-directed movements, the rostral forelimb area (RFA) - considered a premotor region involved in movement planning, and the caudal forelimb area (CFA) - considered a primary motor region that more directly influences movement execution. The authors ask whether there exists a hierarchical interaction between these regions, as previously hypothesized, and focus on a specific definition of hierarchy - examining whether the neural activity in the premotor region exerts a larger functional influence on the activity in the primary motor area than vice versa. They examine this question using advanced experimental and analytical methods, including localized optogenetic manipulation of neural activity in either region while measuring both the neural activity in the other region and EMG signals from several muscles involved in the reaching movement, as well as simultaneous electrophysiology recordings from both regions in a separate cohort of animals.

      The findings presented show that localized optogenetic manipulation of neural activity in either RFA or CFA resulted in similarly short-latency changes in the muscle output and in firing rate changes in the other region. However, perturbation of RFA led to a larger absolute change in the neural activity of CFA neurons. The authors interpret these findings as evidence for reciprocal, but asymmetrical, influence between the regions, suggesting some degree of hierarchy in which RFA has a greater effect on the neural activity in CFA. They go on to examine whether this asymmetry can also be observed in simultaneously recorded neural activity patterns from both regions. They use multiple advanced analysis methods that either identify latent components at the population level or measure the predictability of firing rates of single neurons in one region using firing rates of single neurons in the other region. Interestingly, the main finding across these analyses seems to be that both regions share highly similar components that capture a high degree of variability of the neural activity patterns in each region. Single units' activity from either region could be predicted to a similar degree from the activity of single units in the other region, without a clear division into a leading area and a lagging area, as one might expect to find in a simple hierarchical interaction. However, the authors find some evidence showing a slight bias towards leading activity in RFA. Using a two-region neural network model that is fit to the summed neural activity recorded in the different experiments and to the summed muscle output, the authors show that a network with constrained (balanced) weights between the regions can still output the observed measured activities and the observed asymmetrical effects of the optogenetic manipulations, by having different within-region local weights. These results put into question whether previous and current findings that demonstrate asymmetry in the output of regions can be interpreted as evidence for asymmetrical (and thus hierarchical) inputs between regions, emphasizing the challenges in studying interactions between any brain regions.

      Strengths:

      The experiments and analyses performed in this study are comprehensive and provide a detailed examination and comparison of neural activity recorded simultaneously using dense electrophysiology probes from two main motor regions that have been the focus of studies examining goal-directed movements. The findings showing reciprocal effects from each region to the other, similar short-latency modulation of muscle output by both regions, and similarity of neural activity patterns without a clear lead/lag interaction, are convincing and add to the growing body of evidence that highlight the complexity of the interactions between multiple regions in the motor system and go against a simple feedforward-like network and dynamics. The neural network model complements these findings and adds an important demonstration that the observed asymmetry can, in theory, also arise from differences in local recurrent connections and not necessarily from different input projections from one region to the other. This sheds an important light on the multiple factors that should be considered when studying the interaction between any two brain regions, with a specific emphasis on the role of local recurrent connections, that should be of interest to the general neuroscience community.

      Weaknesses:

      While the similarity of the activity patterns across regions and lack of a clear leading/lagging interaction are interesting observations that are mostly supported by the findings presented (however, see comment below for lack of clarity in CCA/PLS analyses), the main question posed by the authors - whether there exists an endogenous hierarchical interaction between RFA and CFA - seems to be left largely open. The authors note that there is currently no clear evidence of asymmetrical reciprocal influence between naturally occurring neural activity patterns of the two regions, as previous attempts have used non-natural electrical stimulation, lesions, or pharmacological inactivation. The use of acute optogenetic perturbations does not seem to be vastly different in that aspect, as it is a non-natural stimulation of inhibitory interneurons that abruptly perturbs the ongoing dynamics. Furthermore, the main finding that supports a hierarchical interaction is a difference in the absolute change of firing rates as a result of the optogenetic perturbation, a finding that is based on a small number of animals (N = 3 in each experimental group), and one which may be difficult to interpret. As the authors nicely demonstrate in their neural network model, the two regions may differ in the strength of local within-region inhibitory connections. Could this theoretically also lead to a difference in the effect of the artificial light stimulation of the inhibitory inter-neurons on the local population of excitatory projection neurons, driving an asymmetrical effect on the downstream region? Moreover, the manipulation was performed upon the beginning of the reaching movement, while the premotor region is often hypothesized to exert its main control during movement preparation, and thus possibly show greater modulation during that movement epoch. It is not clear if the observed difference in absolute change is dependent on the chosen time of optogenetic stimulation and if this effect is a general effect that will hold if the stimulation is delivered during different movement epochs, such as during movement preparation.

      Another finding that is not clearly interpretable is in the analysis of the population activity using CCA and PLS. The authors show that shifting the activity of one region compared to the other, in an attempt to find the optimal leading/lagging interaction, does not affect the results of these analyses. Assuming the activities of both regions are better aligned at some unknown ground-truth lead/lag time, I would expect to see a peak somewhere in the range examined, as is nicely shown when running the same analyses on a single region's activity. If the activities are indeed aligned at zero, without a clear leading/lagging interaction, but the results remain similar when shifting the activities of one region compared to the other, the interpretation of these analyses is not clear.

    3. Reviewer #2 (Public review):

      Summary:

      While technical advances have enabled large-scale, multi-site neural recordings, characterizing inter-regional communication and its behavioral relevance remains challenging due to intrinsic properties of the brain such as shared inputs, network complexity, and external noise. This work by Saiki-Ishkawa et al. examines the functional hierarchy between premotor (PM) and primary motor (M1) cortices in mice during a directional reaching task. The authors find some evidence consistent with an asymmetric reciprocal influence between the regions, but overall, activity patterns were highly similar and equally predictive of one another. These results suggest that motor cortical hierarchy, though present, is not fully reflected in firing patterns alone.

      Strengths:

      Inferring functional hierarchies between brain regions, given the complexity of reciprocal and local connectivity, dynamic interactions, and the influence of both shared and independent external inputs, is a challenging task. It requires careful analysis of simultaneous recording data, combined with cross-validation across multiple metrics, to accurately assess the functional relationships between regions. The authors have generated a valuable dataset simultaneously recording from both regions at scale from mice performing a cortex-dependent directional reaching task.

      Using electrophysiological and silencing data, the authors found evidence supporting the traditionally assumed asymmetric influence from PM to M1. While earlier studies inferred a functional hierarchy based on partial temporal relationships in firing patterns, the authors applied a series of complementary analyses to rigorously test this hierarchy at both individual neuron and population levels, with robust statistical validation of significance.

      In addition, recording combined with brief optogenetic silencing of the other region allowed authors to infer the asymmetric functional influence in a more causal manner. This experiment is well designed to focus on the effect of inactivation manifesting through oligosynaptic connections to support the existence of a premotor to primary motor functional hierarchy.

      Subsequent analyses revealed a more complex picture. CCA, PLS, and three measures of predictivity (Granger causality, transfer entropy, and convergent cross-mapping) emphasized similarities in firing patterns and cross-region predictability. However, DLAG suggested an imbalance, with RFA capturing CFA variance at a negative time lag, indicating that RFA 'leads' CFA. Taken together these results provide useful insights for current studies of functional hierarchy about potential limitations in inferring hierarchy solely based on firing rates.

      While I would detail some questions and issues on specifics of data analyses and modeling below, I appreciate the authors' effort in training RNNs that match some behavioral and recorded neural activity patterns including the inactivation result. The authors point out two components that can determine the across-region influence - 1) the amount of inputs received and 2) the dependence on across-region input, i.e., the relative importance of local dynamics, providing useful insights in inferring functional relationships across regions.

      Weaknesses:

      (1) Trial-averaging was applied in CCA and PLS analyses. While trial-averaging can be appropriate in certain cases, it leads to the loss of trial-to-trial variance, potentially inflating the perceived similarities between the activity in the two regions (Figure 4). Do authors observe comparable degrees of similarity, e.g., variance explained by canonical variables? Also, the authors report conflicting findings regarding the temporal relationship between RFA and CFA when using CCA/PLS versus DLAG. Could this discrepancy be due to the use of trial-averaging in former analyses but not in the latter?

      (2) A key strength of the current study is the precise tracking of forelimb muscle activity during a complex motor task involving reaching for four different targets. This rich behavioral data is rarely collected in mice and offers a valuable opportunity to investigate the behavioral relevance of the PM-M1 functional interaction, yet little has been done to explore this aspect in depth. For example, single-trial time courses of inter-regional latent variables acquired from DLAG analysis can be correlated with single-trial muscle activity and/or reach trajectories to examine the behavioral relevance of inter-regional dynamics. Namely, can trial-by-trial change in inter-regional dynamics explain behavioral variability across trials and/or targets? Does the inter-areal interaction change in error trials? Furthermore, the authors could quantify the relative contribution of across-area versus within-area dynamics to behavioral variability. It would also be interesting to assess the degree to which across-area and within-area dynamics are correlated. Specifically, can across-area dynamics vary independently from within-area dynamics across trials, potentially operating through a distinct communication subspace?

      (3) While network modeling of RFA and CFA activity captured some aspects of behavioral and neural data, I wonder if certain findings such as the connection weight distribution (Figure 7C), across-region input (Figure 7F), and the within-region weights (Figure 7G), primarily resulted from fitting the different overall firing rates between the two regions with CFA exhibiting higher average firing rates. Did the authors account for this firing rate disparity when training the RNNs?

      (4) Another way to assess the functional hierarchy is by comparing the time courses of movement representation between the two regions. For example, a linear decoder could be used to compare the amount of information about muscle activity and/or target location as well as time courses thereof between the two regions. This approach is advantageous because it incorporates behavior rather than focusing solely on neural activity. Since one of the main claims of this study is the limitation of inferring functional hierarchy from firing rate data alone, the authors should use the behavior as a lens for examining inter-areal interactions.

    4. Reviewer #3 (Public review):

      This study investigates how two cortical regions that are central to the study of rodent motor control (rostral forelimb area, RFA, and caudal forelimb area, CFA) interact during directional forelimb reaching in mice. The authors investigate this interaction using<br /> (1) optogenetic manipulations in one area while recording extracellularly from the other,<br /> (2) statistical analyses of simultaneous CFA/RFA extracellular recordings, and<br /> (3) network modeling.<br /> The authors provide solid evidence that asymmetry between RFA and CFA can be observed, although such asymmetry is only observed in certain experimental and analytical contexts.

      The authors find asymmetry when applying optogenetic perturbations, reporting a greater impact of RFA inactivation on CFA activity than vice-versa. The authors then investigate asymmetry in endogenous activity during forelimb movements and find asymmetry with some analytical methods but not others. Asymmetry was observed in the onset timing of movement-related deviations of local latent components with RFA leading CFA (computed with PCA) and in a relatively higher proportion and importance of cross-area latent components with RFA leading than CFA leading (computed with DLAG). However, no asymmetry was observed using several other methods that compute cross-area latent dynamics, nor with methods computed on individual neuron pairs across regions. The authors follow up this experimental work by developing a two-area model with asymmetric dependence on cross-area input. This model is used to show that differences in local connectivity can drive asymmetry between two areas with equal amounts of across-region input.

      Overall, this work provides a useful demonstration that different cross-area analysis methods result in different conclusions regarding asymmetric interactions between brain areas and suggests careful consideration of methods when analyzing such networks is critical. A deeper examination of why different analytical methods result in observed asymmetry or no asymmetry, analyses that specifically examine neural dynamics informative about details of the movement, or a biological investigation of the hypothesis provided by the model would provide greater clarity regarding the interaction between RFA and CFA.

      Strengths:

      The authors are rigorous in their experimental and analytical methods, carefully monitoring the impact of their perturbations with simultaneous recordings, and providing valid controls for their analytical methods. They cite relevant previous literature that largely agrees with the current work, highlighting the continued ambiguity regarding the extent to which there exists an asymmetry in endogenous activity between RFA and CFA.

      A strength of the paper is the evidence for asymmetry provided by optogenetic manipulation. They show that RFA inactivation causes a greater absolute difference in muscle activity than CFA interaction (deviations begin 25-50 ms after laser onset, Figure 1) and that RFA inactivation causes a relatively larger decrease in CFA firing rate than CFA inactivation causes in RFA (deviations begin <25ms after laser onset, Figure 3). The timescales of these changes provide solid evidence for an asymmetry in the impact of inactivating RFA/CFA on the other region that could not be driven by differences in feedback from disrupted movement (which would appear with a ~50ms delay).

      The authors also utilize a range of different analytical methods, showing an interesting difference between some population-based methods (PCA, DLAG) that observe asymmetry, and single neuron pair methods (granger causality, transfer entropy, and convergent cross mapping) that do not. Moreover, the modeling work presents an interesting potential cause of "hierarchy" or "asymmetry" between brain areas: local connectivity that impacts dependence on across-region input, rather than the amount of across-region input actually present.

      Weaknesses:

      There is no attempt to examine neural dynamics that are specifically relevant/informative about the details of the ongoing forelimb movement (e.g., kinematics, reach direction). Thus, it may be preemptive to claim that firing patterns alone do not reflect functional influence between RFA/CFA. For example, given evidence that the largest component of motor cortical activity doesn't reflect details of ongoing movement (reach direction or path; Kaufman, et al. PMID: 27761519) and that the analytical tools the authors use likely isolate this component (PCA, CCA), it may not be surprising that CFA and RFA do not show asymmetry if such asymmetry is related to the control of movement details. An asymmetry may still exist in the components of neural activity that encode information about movement details, and thus it may be necessary to isolate and examine the interaction of behaviorally-relevant dynamics (e.g., Sani, et al. PMID: 33169030).

      The idea that local circuit dynamics play a central role in determining the asymmetry between RFA and CFA is not supported by experimental data in this paper. The plausibility of this hypothesis is supported by the model but is not explored in any analyses of the experimental data collected. Given the focus on this idea in the discussion, further experimental investigation is warranted.

    5. Author response:

      We thank the reviewers for their constructive feedback here, which will both improve the present manuscript, and help us update our approach as we continue to examine interregional interactions in the motor system. Below we address the concerns raised in the Public Reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This study examined the interaction between two key cortical regions in the mouse brain involved in goal-directed movements, the rostral forelimb area (RFA) - considered a premotor region involved in movement planning, and the caudal forelimb area (CFA) - considered a primary motor region that more directly influences movement execution. The authors ask whether there exists a hierarchical interaction between these regions, as previously hypothesized, and focus on a

      specific definition of hierarchy - examining whether the neural activity in the premotor region exerts a larger functional influence on the activity in the primary motor area than vice versa. They examine this question using advanced experimental and analytical methods, including localized optogenetic manipulation of neural activity in either region while measuring both the neural activity in the other region and EMG signals from several muscles involved in the reaching movement, as well as simultaneous electrophysiology recordings from both regions in a separate cohort of animals.

      The findings presented show that localized optogenetic manipulation of neural activity in either RFA or CFA resulted in similarly short-latency changes in the muscle output and in firing rate changes in the other region. However, perturbation of RFA led to a larger absolute change in the neural activity of CFA neurons. The authors interpret these findings as evidence for reciprocal, but asymmetrical, influence between the regions, suggesting some degree of hierarchy in which RFA has a greater effect on the neural activity in CFA. They go on to examine whether this asymmetry can also be observed in simultaneously recorded neural activity patterns from both regions. They use multiple advanced analysis methods that either identify latent components at the population level or measure the predictability of firing rates of single neurons in one region using firing rates of single neurons in the other region. Interestingly, the main finding across these analyses seems to be that both regions share highly similar components that capture a high degree of variability of the neural activity patterns in each region. Single units' activity from either region could be predicted to a similar degree from the activity of single units in the other region, without a clear division into a leading area and a lagging area, as one might expect to find in a simple hierarchical interaction. However, the authors find some evidence showing a slight bias towards leading activity in RFA. Using a two-region neural network model that is fit to the summed neural activity recorded in the different experiments and to the summed muscle output, the authors show that a network with constrained (balanced) weights between the regions can still output the observed measured activities and the observed asymmetrical effects of the optogenetic manipulations, by having different within-region local weights. These results put into question whether previous and current findings that demonstrate asymmetry in the output of regions can be interpreted as evidence for asymmetrical (and thus hierarchical) inputs between regions, emphasizing the challenges in studying interactions between any brain regions.

      Strengths:

      The experiments and analyses performed in this study are comprehensive and provide a detailed examination and comparison of neural activity recorded simultaneously using dense electrophysiology probes from two main motor regions that have been the focus of studies examining goal-directed movements. The findings showing reciprocal effects from each region to the other, similar short-latency modulation of muscle output by both regions, and similarity of neural activity patterns without a clear lead/lag interaction, are convincing and add to the growing body of evidence that highlight the complexity of the interactions between multiple regions in the motor system and go against a simple feedforward-like network and dynamics. The neural network model complements these findings and adds an important demonstration that the observed asymmetry can, in theory, also arise from differences in local recurrent connections and not necessarily from different input projections from one region to the other. This sheds an important light on the multiple factors that should be considered when studying the interaction between any two brain regions, with a specific emphasis on the role of local recurrent connections, that should be of interest to the general neuroscience community.

      Weaknesses:

      While the similarity of the activity patterns across regions and lack of a clear leading/lagging interaction are interesting observations that are mostly supported by the findings presented (however, see comment below for lack of clarity in CCA/PLS analyses), the main question posed by the authors - whether there exists an endogenous hierarchical interaction between RFA and CFA - seems to be left largely open. 

      The authors note that there is currently no clear evidence of asymmetrical reciprocal influence between naturally occurring neural activity patterns of the two regions, as previous attempts have used non-natural electrical stimulation, lesions, or pharmacological inactivation. The use of acute optogenetic perturbations does not seem to be vastly different in that aspect, as it is a non-natural stimulation of inhibitory interneurons that abruptly perturbs the ongoing dynamics.

      We do believe that our optogenetic inactivation identifies a causal interaction between the endogenous activity patterns in the excitatory projection neurons that are largely silenced, and the endogenous activity that is affected in a downstream region. To clarify, the effect in the downstream region results directly from the silencing of activity in the excitatory projection neurons that connect RFA and CFA. 

      Here we have performed a causal intervention common in biology: a loss-of-function experiment. Such experiments generally reveal that a causal interaction of some sort is present, but often do not clarify much about the nature of the interaction, as is true in our case. By showing that the silencing of endogenous activity in one motor cortical region causes a significant change to the endogenous activity in another, we establish a causal relationship between these activity patterns.

      This is analogous to knocking out the gene for a transcription factor and observing causal effects on the expression of other genes that depends on it. 

      Moreover, our experiments are, to our knowledge, the first that localize a causal relationship to endogenous activity in motor cortical regions at a particular point during motor behavior. Stimulation experiments generate spiking in excitatory projection neurons that is not endogenous. Lesion and pharmacological or chemogenetic inactivation have long-lasting effects, and so their consequences on firing in other regions cannot be attributed to a short-latency influence of activity at a particular point during movement. Moreover, the involvement of motor cortex in motor learning and movement preparation/initiation complicates the interpretation of these consequences vis-à-vis movement execution, as disturbance to processes on which execution depends can impede execution itself. 

      That said, we would agree that the form of the causal interaction between RFA and CFA remains largely unaddressed by our results. These results do not expose how the silenced activity patterns affect activity in the downstream region, just as transcription factor gene knockouts do not expose how the effect on transcription occurs. To show evidence for specific interaction dynamics between RFA and CFA, a different sort of experiment would be necessary. See Jazayeri and Afraz, Neuron, 2017 for more on this issue.

      Furthermore, the main finding that supports a hierarchical interaction is a difference in the absolute change of firing rates as a result of the optogenetic perturbation, a finding that is based on a small number of animals (N = 3 in each experimental group), and one which may be difficult to interpret. 

      Though N = 3 in this case, we do show statistical significance. Moreover, using three replicates is not uncommon in biological experiments that require a large technical investment, including those in rodents.

      As the authors nicely demonstrate in their neural network model, the two regions may differ in the strength of local within-region inhibitory connections. Could this theoretically also lead to a difference in the effect of the artificial light stimulation of the inhibitory interneurons on the local population of excitatory projection neurons, driving an asymmetrical effect on the downstream region? 

      We (Miri et al., Neuron, 2017) and others (Guo et al., Neuron, 2014) have shown that the effect of this inactivation on excitatory neurons in CFA is a near-complete silencing (90-95% within 20 ms). Thus there is not much room for the effects on projection neurons in RFA to be much larger. As part of other work currently in review, we have verified that the effects on RFA projection neuron firing are not larger.

      Moreover, the manipulation was performed upon the beginning of the reaching movement, while the premotor region is often hypothesized to exert its main control during movement preparation, and thus possibly show greater modulation during that movement epoch. It is not clear if the observed difference in absolute change is dependent on the chosen time of optogenetic stimulation and if this effect is a general effect that will hold if the stimulation is delivered during different movement epochs, such as during movement preparation.

      We agree that the dependence of RFA-CFA interactions on movement phase would be interesting to address in subsequent experiments. While a strong interpretation of past lesion results might lead to a hypothesis that premotor influence on primary motor cortex is local to, or stronger during, movement preparation as opposed to execution, at present there is to our knowledge no empirical support from interventional experiments for this hypothesis. Moreover, existing results from analysis of activity in premotor and primary motor cortex have produced conflicting results on the strength of interaction between these regions during preparation. Compare for example Bachschmid-Romano et al., eLife, 2023 to Kaufman et al., Nature Neuroscience, 2014.

      That said, this lesion interpretation would predict the same asymmetry we have observed from perturbations at the beginning of a reach – a larger effect of RFA on CFA than vice versa.

      Another finding that is not clearly interpretable is in the analysis of the population activity using CCA and PLS. The authors show that shifting the activity of one region compared to the other, in an attempt to find the optimal leading/lagging interaction, does not affect the results of these analyses. Assuming the activities of both regions are better aligned at some unknown groundtruth lead/lag time, I would expect to see a peak somewhere in the range examined, as is nicely shown when running the same analyses on a single region's activity. If the activities are indeed aligned at zero, without a clear leading/lagging interaction, but the results remain similar when shifting the activities of one region compared to the other, the interpretation of these analyses is not clear.

      Our results in this case were definitely surprising. Many share the intuition that there should be a lag at which the correlations in activity between connected regions will be strongest. Similarity in alignment across lags might be expected if communication between regions occurs over a range of latencies as a result of dependence on a broad diversity of synaptic paths that connect neurons. In the Discussion, we offer an explanation of how to reconcile these findings with the seemingly different picture presented by DLAG.

      Reviewer #2 (Public review):

      Summary:

      While technical advances have enabled large-scale, multi-site neural recordings, characterizing inter-regional communication and its behavioral relevance remains challenging due to intrinsic properties of the brain such as shared inputs, network complexity, and external noise. This work by Saiki-Ishikawa et al. examines the functional hierarchy between premotor (PM) and primary motor (M1) cortices in mice during a directional reaching task. The authors find some evidence consistent with an asymmetric reciprocal influence between the regions, but overall, activity patterns were highly similar and equally predictive of one another. These results suggest that motor cortical hierarchy, though present, is not fully reflected in firing patterns alone.

      Strengths:

      Inferring functional hierarchies between brain regions, given the complexity of reciprocal and local connectivity, dynamic interactions, and the influence of both shared and independent external inputs, is a challenging task. It requires careful analysis of simultaneous recording data, combined with cross-validation across multiple metrics, to accurately assess the functional relationships between regions. The authors have generated a valuable dataset simultaneously recording from both regions at scale from mice performing a cortex-dependent directional reaching task.

      Using electrophysiological and silencing data, the authors found evidence supporting the traditionally assumed asymmetric influence from PM to M1. While earlier studies inferred a functional hierarchy based on partial temporal relationships in firing patterns, the authors applied a series of complementary analyses to rigorously test this hierarchy at both individual neuron and population levels, with robust statistical validation of significance.

      In addition, recording combined with brief optogenetic silencing of the other region allowed authors to infer the asymmetric functional influence in a more causal manner. This experiment is well designed to focus on the effect of inactivation manifesting through oligosynaptic connections to support the existence of a premotor to primary motor functional hierarchy.

      Subsequent analyses revealed a more complex picture. CCA, PLS, and three measures of predictivity (Granger causality, transfer entropy, and convergent cross-mapping) emphasized similarities in firing patterns and cross-region predictability. However, DLAG suggested an imbalance, with RFA capturing CFA variance at a negative time lag, indicating that RFA 'leads' CFA. Taken together these results provide useful insights for current studies of functional hierarchy about potential limitations in inferring hierarchy solely based on firing rates.

      While I would detail some questions and issues on specifics of data analyses and modeling below, I appreciate the authors' effort in training RNNs that match some behavioral and recorded neural activity patterns including the inactivation result. The authors point out two components that can determine the across-region influence - 1) the amount of inputs received and 2) the dependence on across-region input, i.e., the relative importance of local dynamics, providing useful insights in inferring functional relationships across regions.

      Weaknesses:

      (1) Trial-averaging was applied in CCA and PLS analyses. While trial-averaging can be appropriate in certain cases, it leads to the loss of trial-to-trial variance, potentially inflating the perceived similarities between the activity in the two regions (Figure 4). Do authors observe comparable degrees of similarity, e.g., variance explained by canonical variables? Also, the authors report conflicting findings regarding the temporal relationship between RFA and CFA when using CCA/PLS versus DLAG. Could this discrepancy be due to the use of trial-averaging in former analyses but not in the latter?

      We certainly agree that the similarity in firing patterns is higher in trial averages than on single trials, given the variation in single-neuron firing patterns across trials. Here, we were trying to examine the similarity of activity variance that is clearly movement dependent, as trial averages are, and to use an approach that mirrors those applied in much of the existing literature. We would also agree that there is more that can be learned about interactions from trial-by-trial analysis. 

      It is possible that the activity components identified by DLAG as being asymmetric somehow are not reflected strongly in trial averages. In our Discussion we offer another potential explanation related to the differences in what is calculated in DLAG and CCA/PLS.

      We also note here that all of the firing pattern predictivity analysis we report (Figure 6) was done on single-trial data, and in all cases the predictivity was symmetric. Thus, our results in aggregate are not consistent with symmetry purely being an artifact of trial averaging.

      (2) A key strength of the current study is the precise tracking of forelimb muscle activity during a complex motor task involving reaching for four different targets. This rich behavioral data is rarely collected in mice and offers a valuable opportunity to investigate the behavioral relevance of the PM-M1 functional interaction, yet little has been done to explore this aspect in depth. For example, single-trial time courses of inter-regional latent variables acquired from DLAG analysis can be correlated with single-trial muscle activity and/or reach trajectories to examine the behavioral relevance of inter-regional dynamics. Namely, can trial-by-trial change in inter-regional dynamics explain behavioral variability across trials and/or targets? Does the inter-areal interaction change in error trials? Furthermore, the authors could quantify the relative contribution of across-area versus within-area dynamics to behavioral variability. It would also be interesting to assess the degree to which across-area and within-area dynamics are correlated. Specifically, can acrossarea dynamics vary independently from within-area dynamics across trials, potentially operating through a distinct communication subspace?

      These are all very interesting questions. Our study does not attempt to parse activity into components predictive of muscle activity and others that may reflect other functions. Distinct components of RFA and CFA activity may be involved in distinct interactions between them.

      (3) While network modeling of RFA and CFA activity captured some aspects of behavioral and neural data, I wonder if certain findings such as the connection weight distribution (Figure 7C), across-region input (Figure 7F), and the within-region weights (Figure 7G), primarily resulted from fitting the different overall firing rates between the two regions with CFA exhibiting higher average firing rates. Did the authors account for this firing rate disparity when training the RNNs?

      The key comparison in Figure 7 is shown in 7F, where the firing rates are accounted for in calculating the across-region input strength. Equalizing the firing rates in RFA and CFA would effectively increase RFA rates. If the mean firing rates in each region were appreciably dependent on across-region inputs, we would then expect an off-setting change in the RFA→CFA weights, such that the RFA→CFA distributions in 7F would stay the same. We would also expect the CFA→RFA weights would increase, since RFA neurons would need more input. This would shift the CFA→RFA (blue) distributions up. Thus, if anything, the key difference in this panel would only get larger. 

      We also generally feel that it is a better approach to fit the actual firing rates, rather than normalizing, since normalizing the firing rates would take us further from the actual biology, not closer.

      (4) Another way to assess the functional hierarchy is by comparing the time courses of movement representation between the two regions. For example, a linear decoder could be used to compare the amount of information about muscle activity and/or target location as well as time courses thereof between the two regions. This approach is advantageous because it incorporates behavior rather than focusing solely on neural activity. Since one of the main claims of this study is the limitation of inferring functional hierarchy from firing rate data alone, the authors should use the behavior as a lens for examining inter-areal interactions.

      As we state above, we agree that examining interactions specific to movement-related activity components could be illuminating. Since it remains a challenge to rigorously identify a subset of activity patterns specifically related to driving muscle activity, any such analysis would involve an additional assumption. It remains unclear how well the motor cortical activity that decoders use for predicting muscle activity matches the motor cortical activity that actually drives muscle activity in situ. 

      Reviewer #3 (Public review):

      This study investigates how two cortical regions that are central to the study of rodent motor control (rostral forelimb area, RFA, and caudal forelimb area, CFA) interact during directional forelimb reaching in mice. The authors investigate this interaction using

      (1) optogenetic manipulations in one area while recording extracellularly from the other,

      (2) statistical analyses of simultaneous CFA/RFA extracellular recordings, and

      (3) network modeling.

      The authors provide solid evidence that asymmetry between RFA and CFA can be observed, although such asymmetry is only observed in certain experimental and analytical contexts.

      The authors find asymmetry when applying optogenetic perturbations, reporting a greater impact of RFA inactivation on CFA activity than vice-versa. The authors then investigate asymmetry in endogenous activity during forelimb movements and find asymmetry with some analytical methods but not others. Asymmetry was observed in the onset timing of movement-related deviations of local latent components with RFA leading CFA (computed with PCA) and in a relatively higher proportion and importance of cross-area latent components with RFA leading than CFA leading (computed with DLAG). However, no asymmetry was observed using several other methods that compute cross-area latent dynamics, nor with methods computed on individual neuron pairs across regions. The authors follow up this experimental work by developing a twoarea model with asymmetric dependence on cross-area input. This model is used to show that differences in local connectivity can drive asymmetry between two areas with equal amounts of across-region input.

      Overall, this work provides a useful demonstration that different cross-area analysis methods result in different conclusions regarding asymmetric interactions between brain areas and suggests careful consideration of methods when analyzing such networks is critical. A deeper examination of why different analytical methods result in observed asymmetry or no asymmetry, analyses that specifically examine neural dynamics informative about details of the movement, or a biological investigation of the hypothesis provided by the model would provide greater clarity regarding the interaction between RFA and CFA.

      Strengths:

      The authors are rigorous in their experimental and analytical methods, carefully monitoring the impact of their perturbations with simultaneous recordings, and providing valid controls for their analytical methods. They cite relevant previous literature that largely agrees with the current work, highlighting the continued ambiguity regarding the extent to which there exists an asymmetry in endogenous activity between RFA and CFA.

      A strength of the paper is the evidence for asymmetry provided by optogenetic manipulation. They show that RFA inactivation causes a greater absolute difference in muscle activity than CFA interaction (deviations begin 25-50 ms after laser onset, Figure 1) and that RFA inactivation causes a relatively larger decrease in CFA firing rate than CFA inactivation causes in RFA (deviations begin <25ms after laser onset, Figure 3). The timescales of these changes provide solid evidence for an asymmetry in the impact of inactivating RFA/CFA on the other region that could not be driven by differences in feedback from disrupted movement (which would appear with a ~50ms delay).

      The authors also utilize a range of different analytical methods, showing an interesting difference between some population-based methods (PCA, DLAG) that observe asymmetry, and single neuron pair methods (granger causality, transfer entropy, and convergent cross mapping) that do not. Moreover, the modeling work presents an interesting potential cause of "hierarchy" or "asymmetry" between brain areas: local connectivity that impacts dependence on across-region input, rather than the amount of across-region input actually present.

      Weaknesses:

      There is no attempt to examine neural dynamics that are specifically relevant/informative about the details of the ongoing forelimb movement (e.g., kinematics, reach direction). Thus, it may be preemptive to claim that firing patterns alone do not reflect functional influence between RFA/CFA. For example, given evidence that the largest component of motor cortical activity doesn't reflect details of ongoing movement (reach direction or path; Kaufman, et al. PMID: 27761519) and that the analytical tools the authors use likely isolate this component (PCA, CCA), it may not be surprising that CFA and RFA do not show asymmetry if such asymmetry is related to the control of movement details. 

      An asymmetry may still exist in the components of neural activity that encode information about movement details, and thus it may be necessary to isolate and examine the interaction of behaviorally-relevant dynamics (e.g., Sani, et al. PMID: 33169030).

      To clarify, we are not claiming that firing patterns in no way reflect the asymmetric functional influence that we demonstrate with optogenetic inactivation. Instead, we show that certain types of analysis we might expect to reflect such influence, in fact, do not. Indeed, DLAG did exhibit asymmetries that matched those seen in functional influence (at least qualitatively), though other methods we applied did not.

      As we state above, we do think that there is more that can be gleaned by looking at influence specifically in terms of activity related to movement. However, if we did find that movement-related activity exhibited an asymmetry matching that of functional influence in cases where overall activity exhibited symmetry, our results imply that the activity not related to movement would exhibit an opposite asymmetry, such that the overall balance is symmetric. This would itself be surprising. We also note that the components identified by CCA and PLS show substantial variation across reach targets, indicating that they are not only reflecting condition-invariant components. These analyses used over 90% of the total activity variance, suggesting that both condition-dependent and condition-invariant components are included.

      The idea that local circuit dynamics play a central role in determining the asymmetry between RFA and CFA is not supported by experimental data in this paper. The plausibility of this hypothesis is supported by the model but is not explored in any analyses of the experimental data collected. Given the focus on this idea in the discussion, further experimental investigation is warranted.

      While we do not provide experimental support for this hypothesis, the data we present also do not contradict this hypothesis. Here we used modeling as it is often used – to capture experimental results and generate hypotheses about potential explanations. We feel that our Discussion makes clear where the hypothesis derives from and does not misrepresent the lack of experimental support. We expect readers will take our engagement with this hypothesis with the appropriate grain of salt. The imaginable experiments to support such a hypothesis would constitute another substantial study requiring numerous controls – a whole other paper in itself.

    1. eLife Assessment

      This study presents an important finding that ant nest structure and digging behavior depend on ant age demographics for a ground-dwelling ant species (Camponotus fellah). By asking whether ants employ age-polyethism in excavation, the authors address a long-standing question about how individuals in collectives determine the overall state of the task they must perform, and their results may prove to be a key consideration for interpreting results from other studies in the field of social insect behavior. While the experimental evidence that the age of the ants and the group composition affect the digging of tunnels is solid, some of the analyses and modeling are seem superfluous, as they do not further support the results or contribute to a deeper understanding of the system.

    2. Reviewer #1 (Public review):

      This study investigates how ant group demographics influence nest structures and group behaviors of Camponotus fellah ants, a ground-dwelling carpenter ant species (found locally in Israel) that build subterranean nest structures. Using a quasi-2D cell filled with artificial sand, the authors perform two complementary sets of experiments to try to link group behavior and nest structure: first, the authors place a mated queen and several pupae into their cell and observe the structures that emerge both before and after the pupae eclose (i.e., "colony maturation" experiments); second, the authors create small groups (of 5,10, or 15 ants, each including a queen) within a narrow age range (i.e., "fixed demographic" experiments) to explore the dependence of age on construction. Some of the fixed demographic instantiations included a manually induced catastrophic collapse event; the authors then compared emergency repair behavior to natural nest creation. Finally, the authors introduce a modified logistic growth model to describe the time-dependent nest area. The modification introduces parameters that allow for age-dependent behavior, and the authors use their fixed demographic experiments to set these parameters, and then apply the model to interpret the behavior of the colony maturation experiments. The main results of this paper are that for natural nest construction, nest areas, and morphologies depend on the age demographics of ants in the experiments: younger ants create larger nests and angled tunnels, while older ants tend to dig less and build predominantly vertical tunnels; in contrast, emergency response seems to elicit digging in ants of all ages to repair the nest.

    3. Reviewer #2 (Public review):

      I enjoyed this paper and the approach to examining an accepted wisdom of ants determining overall density by employing age polyethism that would reduce the computational complexity required to match nest size with population (although I have some questions about the requirement that growth is infinite in such a solution). Moreover, the realization that models of collective behaviour may be inappropriate in many systems in which agents (or individuals) differ in the behavioural rules they employ, according to age, location, or information state. This is especially important in a system like social insects, typically held as a classic example of individual-as-subservient to whole, and therefore most likely to employ universal rules of behaviour. The current paper demonstrates a potentially continuous age-related change in target behaviour (excavation), and suggests an elegant and minimal solution to the requirement for building according to need in ants, avoiding the invocation of potentially complex cognitive mechanisms, or information states that all individuals must have access to in order to have an adaptive excavation output.

      The only real reservation I have is in the question of how this relationship could hold in properly mature colonies in which there is (presumably) a balance between the birth and death of older workers. Would the prediction be that the young ants still dig, or would there be a cessation of digging by young ants because the area is already sufficient? Another way of asking this is to ask whether the innate amount of digging that young ants do is in any way affected by the overall spatial size of the colony. If it is, then we are back to a problem of perfect information - how do the young ants know how big the overall colony is? Perhaps using density as a proxy? Alternatively, if the young ants do not modify their digging, wouldn't the colony become continuously larger? As a non-expert in social insects, I may be misunderstanding and it may be already addressed in the citations used.

      In any case, this is an excellent paper. The modelling approach is excellent and compelling, also allowing extrapolation to other group sizes and even other species. This to me is the main strength of the paper, as the answer to the question of whether it is younger or older ants that primarily excavate nests could have been answered by an individual tracking approach (albeit there are practical limitations to this, especially in the observation nest setup, as the authors point out). The analysis of the tunnel structure is also an important piece of the puzzle, and I really like the overall study.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Harikrishnan Rajendran, Roi Weinberger, Ehud Fonio, and Ofer Feinerman measured the digging behaviours of queens and workers for the first 6 months of colony development, as well as groups of young or old ants. They also provide a quantitative model describing the digging behaviours and allowing predictions. They found that young ants dig more slanted tunnels, while older ants dig more vertically (straight down). This finding is important, as it describes a new form of age polyethism (a division of labour based on age). Age polyethism is described as a "yes or no" mechanism, where individuals perform or not a task according to their age (usually young individuals perform in-nest tasks, and older ones foraging). Here, the way of performing the task is modified, not only the propensity to carry it or not. This data therefore adds in an interesting way to the field of collective behaviours and division of labour.

      The conclusions of the paper are well supported by the data. Measurements of the same individuals over time would have strengthened the claims.

      Strengths:

      I find that the measure of behaviour through development is of great value, as those studies are usually done at a specific time point with mature colonies. The description of a behaviour that is modified with age is a notable finding in the world of social insects. The sample sizes are adequate and all the information clearly provided either in the methods or supplementary.

      Weaknesses:

      I think the paper is failing to take into consideration or at least discuss the role of inter-individual variabilities. Tasks have been known to be undertaken by only a few hyper-active individuals for example. Comments on the choice to use averages and the potential roles of variations between individuals are in my opinion lacking. Throughout the paper wording should be modified to refer to the group and not the individuals, as it was the collective digging that was measured. Another issue I had was the use of "mature colony" for colonies with very few individuals and only 6 months of age. Comments on the low number of workers used compared to natural mature colonies would be welcome.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This study investigates how ant group demographics influence nest structures and group behaviors of Camponotus fellah ants, a ground-dwelling carpenter ant species (found locally in Israel) that build subterranean nest structures. Using a quasi-2D cell filled with artificial sand, the authors perform two complementary sets of experiments to try to link group behavior and nest structure: first, the authors place a mated queen and several pupae into their cell and observe the structures that emerge both before and after the pupae eclose (i.e., "colony maturation" experiments); second, the authors create small groups (of 5,10, or 15 ants, each including a queen) within a narrow age range (i.e., "fixed demographic" experiments) to explore the dependence of age on construction. Some of the fixed demographic instantiations included a manually induced catastrophic collapse event; the authors then compared emergency repair behavior to natural nest creation. Finally, the authors introduce a modified logistic growth model to describe the time-dependent nest area. The modification introduces parameters that allow for age-dependent behavior, and the authors use their fixed demographic experiments to set these parameters, and then apply the model to interpret the behavior of the colony maturation experiments. The main results of this paper are that for natural nest construction, nest areas, and morphologies depend on the age demographics of ants in the experiments: younger ants create larger nests and angled tunnels, while older ants tend to dig less and build predominantly vertical tunnels; in contrast, emergency response seems to elicit digging in ants of all ages to repair the nest.

      We sincerely thank Reviewer #1 for the time and effort dedicated to our manuscript's detailed review and assessment. The revision suggestions were constructive, and we will incorporate them into the next version to improve the manuscript.

      Reviewer #2 (Public review):

      I enjoyed this paper and the approach to examining an accepted wisdom of ants determining overall density by employing age polyethism that would reduce the computational complexity required to match nest size with population (although I have some questions about the requirement that growth is infinite in such a solution). Moreover, the realization that models of collective behaviour may be inappropriate in many systems in which agents (or individuals) differ in the behavioural rules they employ, according to age, location, or information state. This is especially important in a system like social insects, typically held as a classic example of individual-as-subservient to whole, and therefore most likely to employ universal rules of behaviour. The current paper demonstrates a potentially continuous age-related change in target behaviour (excavation), and suggests an elegant and minimal solution to the requirement for building according to need in ants, avoiding the invocation of potentially complex cognitive mechanisms, or information states that all individuals must have access to in order to have an adaptive excavation output.

      We sincerely thank reviewer #2 for the time and effort dedicated to our manuscript's detailed review and assessment. The insightful feedback provided by the reviewer will be incorporated into the successive revisions.

      The only real reservation I have is in the question of how this relationship could hold in properly mature colonies in which there is (presumably) a balance between the birth and death of older workers. Would the prediction be that the young ants still dig, or would there be a cessation of digging by young ants because the area is already sufficient? Another way of asking this is to ask whether the innate amount of digging that young ants do is in any way affected by the overall spatial size of the colony. If it is, then we are back to a problem of perfect information - how do the young ants know how big the overall colony is? Perhaps using density as a proxy? Alternatively, if the young ants do not modify their digging, wouldn't the colony become continuously larger? As a non-expert in social insects, I may be misunderstanding and it may be already addressed in the citations used.

      We thank the reviewer for this interesting question. We find that the nest excavation is predominantly performed by the younger ants in the nest and the nest area increase is followed by an increase in the population. However, if the young ants dig unrestricted, this could result in unnecessary nest growth as suggested by reviewer #2. Therefore, we believe that the innate digging behavior of ants could potentially be regulated by various cues such as;

      (a) Density-based: If the colony becomes less dense as its area expands, this could serve as a feedback signal for young ants to reduce or stop digging, as described in references (25, 29, 30).

      (b) Pheromone depositions: If the colony reaches a certain population density, pheromone signals could inhibit further digging by young ants, references (25, 29,) or space usage as a proxy for the nest area.

      Thus, rather than perfect information, decentralized control, and digging-based local cues probably regulate the level of age-dependent digging, without the ants needing to estimate the overall colony size or nest area.

      In any case, this is an excellent paper. The modelling approach is excellent and compelling, also allowing extrapolation to other group sizes and even other species. This to me is the main strength of the paper, as the answer to the question of whether it is younger or older ants that primarily excavate nests could have been answered by an individual tracking approach (albeit there are practical limitations to this, especially in the observation nest setup, as the authors point out). The analysis of the tunnel structure is also an important piece of the puzzle, and I really like the overall study.

      We thank the reviewer for the comments. We completely agree that individual tracking of ants within our experimental setup would have been the ideal approach, but we were limited by technical and practical limitations of the setup as pointed out by the reviewer such as;

      (a) Continuous tracking of ants in our nests would have required a camera to be positioned at all times in front of the nest, which necessitates a light background. Since Camponotus fellah ants are subterranean, we aimed to allow them to perform nest excavation in conditions as close to their natural dark environment as possible. Additionally, implementing such a system in front of each nest would have reduced the sample sizes for our treatments.

      (b) The experimental duration of our colony maturation and fixed demographics experiments extended for up to six months (unprecedented durations in these kinds of measurements). These naturally limited our ability to conduct individual tracking while maintaining the identity of each ant based on the current design.

      Reviewer #3 (Public review):

      Summary:

      In this study, Harikrishnan Rajendran, Roi Weinberger, Ehud Fonio, and Ofer Feinerman measured the digging behaviours of queens and workers for the first 6 months of colony development, as well as groups of young or old ants. They also provide a quantitative model describing the digging behaviours and allowing predictions. They found that young ants dig more slanted tunnels, while older ants dig more vertically (straight down). This finding is important, as it describes a new form of age polyethism (a division of labour based on age). Age polyethism is described as a "yes or no" mechanism, where individuals perform or not a task according to their age (usually young individuals perform in-nest tasks, and older ones foraging). Here, the way of performing the task is modified, not only the propensity to carry it or not. This data therefore adds in an interesting way to the field of collective behaviours and division of labour.

      The conclusions of the paper are well supported by the data. Measurements of the same individuals over time would have strengthened the claims.

      We sincerely thank reviewer #3 for the time and effort dedicated to our manuscript's detailed review and assessment. We completely agree with the reviewer’s comments on the measurements of the same individuals over time, however, we were limited by the technical and experimental limitations as described above and pointed out by reviewer #2.

      Strengths:

      I find that the measure of behaviour through development is of great value, as those studies are usually done at a specific time point with mature colonies. The description of a behaviour that is modified with age is a notable finding in the world of social insects. The sample sizes are adequate and all the information clearly provided either in the methods or supplementary.

      We thank the reviewer #3 for this assessment.

      Weaknesses:

      I think the paper is failing to take into consideration or at least discuss the role of inter-individual variabilities. Tasks have been known to be undertaken by only a few hyper-active individuals for example. Comments on the choice to use averages and the potential roles of variations between individuals are in my opinion lacking. Throughout the paper wording should be modified to refer to the group and not the individuals, as it was the collective digging that was measured. Another issue I had was the use of "mature colony" for colonies with very few individuals and only 6 months of age. Comments on the low number of workers used compared to natural mature colonies would be welcome.

      Regarding main comment 1

      We completely agree with the reviewer’s comment on considering inter-individual variability based on activity levels. We have discussed how individual morphological variability could influence digging behavior (references: 28, 31), and we will elaborate further on this aspect in future revisions.

      Regarding main comment 2:

      We agree with the reviewer’s comments regarding the wording. The term “mature colony” will be revised in future versions. The wording (“mature colony”‘) will be changed and addressed in the future revisions. We were practically limited by the continuation of the experiments for more than 6 months of age predominantly due to the stability of nests as they were made with a sand-soil mix. We also acknowledge that the colony sizes attained in our maturation experiments may be smaller than those of naturally matured colonies. This trend was observed generally in lab-reared colonies and could be attributed to differences in microclimatic conditions, foraging opportunities, space availability, and other factors. We will address these aspects in more detail in future revisions.

    1. eLife Assessment

      This important paper takes a novel approach to the problem of automatically reconstructing long-range axonal projections from stacks of images. The key innovation is to separate the identification of sections of an axon from the statistical rules used to constrain global structure. The authors provide convincing evidence that their method is a significant improvement over existing measures in circumstances where the labelling of axons and dendrites is relatively dense, but the robustness to image noise remains to be tested.

    2. Reviewer #1 (Public review):

      Summary:

      The authors introduce a novel algorithm for the automatic identification of long-range axonal projections. This is an important problem as modern high-throughput imaging techniques can produce large amounts of raw data, but identifying neuronal morphologies and connectivities requires large amounts of manual work. The algorithm works by first identifying points in three-dimensional space corresponding to parts of labelled neural projections, these are then used to identify short sections of axons using an optimisation algorithm and the prior knowledge that axonal diameters are relatively constant. Finally, a statistical model that assumes axons tend to be smooth is used to connect the sections together into complete and distinct neural trees. The authors demonstrate that their algorithm is far superior to existing techniques, especially when dense labelling of the tissue means that neighbouring neurites interfere with the reconstruction. Despite this improvement, however, the accuracy of reconstruction remains below 90%, so manual proofreading is still necessary to produce accurate reconstructions of axons.

      Strengths:

      The new algorithm combines local and global information to make a significant improvement on the state-of-the-art for automatic axonal reconstruction. The method could be applied more broadly and might have applications to reconstructions of electron microscopy data, where similar issues of high-throughput imaging and relatively slow or inaccurate reconstruction remain.

      Weaknesses:

      There are three weaknesses in the algorithm and manuscript.

      (1) The best reconstruction accuracy is below 90%, which does not fully solve the problem of needing manual proofreading.

      (2) The 'minimum information flow tree' model the authors use to construct connected axonal trees has the potential to bias data collection. In particular, the assumption that axons should always be as smooth as possible is not always correct. This is a good rule-of-thumb for reconstructions, but real axons in many systems can take quite sharp turns and this is also seen in the data presented in the paper (Figure 1C). I would like to see explicit acknowledgement of this bias in the current manuscript and ideally a relaxation of this rule in any later versions of the algorithm.

      (3) The writing of the manuscript is not always as clear as it could be. The manuscript would benefit from careful copy editing for language, and the Methods section in particular should be expanded to more clearly explain what each algorithm is doing. The pseudo-code of the Supplemental Information could be brought into the Methods if possible as these algorithms are so fundamental to the manuscript.

    3. Reviewer #2 (Public review):

      In this manuscript, Cai et al. introduce PointTree, a new automated method for the reconstruction of complex neuronal projections. This method has the potential to drastically speed up the process of reconstructing complex neurites. The authors use semi-automated manual reconstruction of neurons and neurites to provide a 'ground-truth' for comparison between PointTree and other automated reconstruction methods. The reconstruction performance is evaluated for precision, recall, and F1-score and positions. The performance of PointTree compared to other automated reconstruction methods is impressive based on these 3 criteria.

      As an experimentalist, I will not comment on the computational aspects of the manuscript. Rather, I am interested in how PointTree's performance decreases in noisy samples. This is because many imaging datasets contain some level of background noise for which the human eye appears essential for the accurate reconstruction of neurites. Although the samples presented in Figure 5 represent an inherent challenge for any reconstruction method, the signal-to-noise ratio is extremely high (also the case in all raw data images in the paper). It would be interesting to see how PointTree's performance changes in increasingly noisy samples, and for the author to provide general guidance to the scientific community as to what samples might not be accurately reconstructed with PointTree.

    1. eLife Assessment

      This study presents a valuable finding that altruistic tendency during moral decision-making is context-dependent (present in the gain domain but absent in the loss domain) and its absence in the loss domain can be restored by the neuropeptide oxytocin. However, the evidence supporting this claim is somewhat incomplete and would benefit from better overall framing and clarity on its approaches. Overall, this study will be of interest to social scientists and neuroscientists who work on moral decision-making and oxytocin.