10,000 Matching Annotations
  1. Feb 2025
    1. eLife Assessment

      Using experiments in the white fly, this manuscript provides evidence that the bacterial symbiont Wolbachia can be transmitted from parasitoid wasps to their insect hosts. Characterizing the transfer of Wolbachia between insect species is a valuable attempt to explain the widespread of this intracellular bacterium. This paper is incomplete as it does not furnish sufficient data to support several of its claims for which additional methods and data are necessary.

    2. Reviewer #2 (Public review):

      The resubmitted version of the paper by Yan et al. titled "Frequent intertrophic transmission of Wolbachia by parasitism but not predation" contains all the major flaws I found in the original submission. As far as I could see, the authors did not address my original concerns.

      In short:

      (1) A control of Portiera MUST be included in the FISH experiments, if the claim that Wolbachia is not only transferred from a parasitoid to the whitefly, but finds its way to the bacteriocytes. This is especially true for the Q, a biotype for which the pattern of Wolbachia distribution has been documented as scattered in naturally infected populations. The very strong signal in the whitefly bacteriocytes implies Portiera.

      (2) In my original review I wrote: "The authors fail to discuss, or even acknowledge, a number of published studies that specifically show no horizontal transmission such as the one claimed to be detected in the study presented." In return the authors wrote in their rebuttal letter: "We have made corresponding modifications to the discussion section (Lines 256-271in the revised manuscript) and have discussed the published studies that report no evidence of horizontal transmission (Lines 260-263 in the revised manuscript)." However, the stated lines are concerned with a different subject. In addition, in their letter the authors write "Additionally, some experiments have found no evidence of horizontal transmission of Wolbachia (39- 42) (Lines 260-263 in the revised manuscript)." Beside the fact that the line numbers are wrong, the papers cited are entirely irrelevant as they do not discuss parasitoids.

      (3) My original comment on the origin of sequences used for the phylogenetical analysis still stands. It is hard to claim a data-based search, when most of the data originate in the authors lab. The explanation of the confusion with the Qi et al. (2019) paper should at least be mentioned in the M&M. Apologies if it has been included and I missed it.

    1. eLife Assessment

      This study provides a convincing explanation for why HIV-1 Vif causes a qualitatively different cell cycle arrest to its accessory gene counterpart Vpr. The authors use elegant time-dependent microscopy reporter assays in immortalized tumor cell models to show that HIV-1 Vif causes a pseudo-metaphase arrest rather than a G2 arrest. The metaphase arrest correlates with dysregulation of the kinetochore that could be explained by the loss of phosphatase functions that determine chromosome-microtubule interactions. These valuable findings lay the groundwork for additional studies examining the mechanisms and consequences of this Vif-dependent phenotype in the viral life cycle and in primary cells more relevant to HIV-1 pathogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      Ghone et al show that HIV-1 Vif causes a pseudo-metaphase arrest rather than a G2 arrest. The metaphase arrest correlates with misregulation of the kinetochore that could be explained by the loss of phosphatase functions that determine chromosome-microtubule interactions.

      Strengths:

      The single-cell imaging using different reporters of cell cycle progression is very elegant and the quantitation is convincing. The authors clearly show that what others have characterized as a G2 arrest by flow cytometry is somewhat later in metaphase and correlates with kinetocore misregulation.

      Weaknesses:

      (1) The major problem with the paper is trying to connect what is observed in tumor cell lines with actual infections in primary T cells. While all of the descriptive work in cell lines is convincing, none of these cells are relevant targets and tumor cells have different cell death and cell cycle regulation than primary T cells. Thus, while Vif might well do all of the things described in the manuscript, it is a stretch to connect any of it to what happens in vivo. In the revised version, the authors now acknowledge this caveat.

      (2) Line 109 and elsewhere. The ability of Vif to cause cell cycle arrest and bind PP2A subunits is not a completely conserved feature. Rather, it is quite variable in different HIV-1 strains. (e.g. https://doi.org/10.1016/j.bbrc.2020.04.123 and https://elifesciences.org/articles/53036). Therefore, it is necessary for the authors to quite clearly use strain designations in the manuscript rather than a generic "Vif", and to more clearly describe the viruses being used. In the revised version, the authors now make this more clear.

      (3) Figure 5: This figure shows disruption of PP2A-B56 at the kinetochores. However, is this specific to the kinetochores? Since Vif has been described to more broadly degrade PP2A-B56, could this not be a result of a more general decrease in PP2A activity throughout the cell? In the revised version, the authors now clarify this point.

    3. Reviewer #2 (Public review):

      Summary

      The authors characterize the cell-cycle arrest induced by HIV-1 Vif in infected cells. They show this arrest is not at G2/M as previously thought but during metaphase. They show that the metaphase plate forms normally but progression to anaphase is massively delayed, and chromosome segregation is dysregulated in a manner consistent with impaired assembly of microtubules at the kinetochore. This correlates with the lack of recruitment of B56-subunits of PP2 phosphatase which are known degradation targets of Vif, suggesting that this weakens and unbalances the microtubule-mediated forces on the separating chromosomes.

      Strengths

      The authors present a very well-performed set of quantitative live cell imaging experiments that convincingly show a difference between Vif and Vpr-mediated cell cycle arrests. Through an in-depth characterization of the Vif-mediated block in metaphase, they make a strong case for this phenotype being tied to the degradation of PP2-B56 by Vif. Furthermore, it is important that they have performed most of these experiments with virally infected cells, meaning that their observations are observable at relevant viral expression levels of Vif.

      Comments on revisions:

      The authors have addressed the concerns and have discussed them accordingly. I hope they pursue the in vivo relevance in their future work

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ghone et al show that HIV-1 Vif causes a pseudo-metaphase arrest rather than a G2 arrest. The metaphase arrest correlates with misregulation of the kinetochore which could be explained by the loss of phosphatase functions that determine chromosome-microtubule interactions.

      Strengths:

      The single-cell imaging using different reporters of cell cycle progression is very elegant and the quantitation is convincing. The authors clearly show that what others have characterized as a G2 arrest by flow cytometry is somewhat later in metaphase and correlates with kinetochore misregulation.

      We sincerely appreciate the reviewer recognizing the quality and precision of our study, particularly our use of long-term live cell imaging combined with single-cell resolution analysis.

      Weaknesses:

      (1) The major problem with the paper is trying to connect what is observed in tumor cell lines with actual infections in primary T cells. While all of the descriptive work in cell lines is convincing, none of these cells are relevant targets and tumor cells have different cell death and cell cycle regulation than primary T cells. Thus, while Vif might well do all of the things described in the manuscript, it is a stretch to connect any of it to what happens in vivo.

      We fully agree with this point. It is indeed technically challenging to perform 48-120 hours of live-cell imaging at high magnification at short intervals using primary T cells because of their non-adherent nature. We also agree that Vif’s functions in pseudo-metaphase arrest and the consequent induction of cell death, observed in cancer cells (e.g., Cal51, HeLa, and MDA-MB-231 cell lines) or normal non-transformed epithelial cells (e.g., the RPE1 cell line), may differ in T cells. Further studies and refined approaches will be required to address this important question. We have revised the manuscript to include a discussion of this issue in the section of Limitation of this study.

      (2) Line 109 and elsewhere. The ability of Vif to cause cell cycle arrest and bind PP2A subunits is not a completely conserved feature. Rather, it is quite variable in different HIV-1 strains. (e.g. https://doi.org/10.1016/j.bbrc.2020.04.123 and https://elifesciences.org/articles/53036). Therefore, it is necessary for the authors to quite clearly use strain designations in the manuscript rather than a generic "Vif", and to more clearly describe the viruses being used.

      Thank you for raising this important point. We utilized the NL4-3 strain in our study and have revised the manuscript to specify this detail. While this study uncovered part of the mechanism by which Vif modulates phosphatase regulation during mitosis, further research is required to elucidate the full mechanism, particularly how this degradation induces a robust pseudo-metaphase arrest.

      (3) Figure 5: This figure shows disruption of PP2A-B56 at the kinetochores. However, is this specific to the kinetochores? Since Vif has been described to more broadly degrade PP2A-B56, could this not be a result of a more general decrease in PP2A activity throughout the cell?

      Thank you for highlighting this critical point. PP2A is a major serine/threonine phosphatase that regulates numerous essential cell cycle processes. To the best of our knowledge, Vif selectively targets the degradation of the B56 family of PP2A regulatory subunits, without affecting other three B-type subunits or the catalytic core of PP2A itself. During early mitosis, all five members of the B56 family (B56α, B56β, B56γ, B56δ, and B56ε) accumulate at kinetochores and centromeres, where they play critical roles in chromosome alignment. Many PP2A-B56 substrates are also localized to kinetochores and chromosomes during mitosis. Depletion of specific B56 isoforms or introduction of phosphorylation-deficient mutants of PP2A-B56 substrates at kinetochores has been shown to result in mitotic defects, underscoring the crucial roles of PP2A-B56 in regulating kinetochore, centromere, and chromosomal functions during mitosis. Interestingly, we observed no significant cell cycle arrest during G1, S, or G2 phases in Vif-expressing cells. While PP2A-B56 likely has important roles outside of mitosis, Vif-mediated degradation of PP2A-B56 appears to selectively disrupt its mitotic functions, particularly at the kinetochore. This finding highlights a targeted mechanism by which Vif interferes with PP2A-B56-mediated regulation of mitotic processes. However, further experiments are required to elucidate the precise mechanisms underlying Vif's inhibition of the specific mitotic roles of PP2A-B56.

      Reviewer #2 (Public review):

      Summary

      The authors characterize the cell-cycle arrest induced by HIV-1 Vif in infected cells. They show this arrest is not at G2/M as previously thought but during metaphase. They show that the metaphase plate forms normally but progression to anaphase is massively delayed, and chromosome segregation is dysregulated in a manner consistent with impaired assembly of microtubules at the kinetochore. This correlates with the lack of recruitment of B56-subunits of PP2 phosphatase which are known degradation targets of Vif, suggesting that this weakens and unbalances the microtubule-mediated forces on the separating chromosomes.

      Strengths

      The authors present a very well-performed set of quantitative live cell imaging experiments that convincingly show a difference between Vif and Vpr-mediated cell cycle arrests. Through an in-depth characterization of the Vif-mediated block in metaphase, they make a strong case for this phenotype being tied to the degradation of PP2-B56 by Vif. Furthermore, it is important that they have performed most of these experiments with virally infected cells, meaning that their observations are observable at relevant viral expression levels of Vif.

      We appreciate the reviewer’s recognition of the importance and significance of our study.

      Weaknesses

      Experimentally there is very little to criticize with respect to the cellular systems used. Data from 10.1016/j.bbrc.2020.04.123 has identified selective mutants that fail to degrade B56 while maintaining A3G degradation by Cul5, and it would be nice to confirm that such a mutant behaves like the delta-Vif virus when examining metaphase, but selective ablation of B56 during mitosis to mimic Vif is would expect to be very challenging and beyond the scope.

      Thank you for your valuable suggestion. As also highlighted by Reviewer #1, it is true that certain variants of Vif, as discussed in 10.1016/j.bbrc.2020.04.123, differentially impact B56 degradation. Notably, some variants degrade A3G without inducing cell cycle arrest. We agree that investigating whether Vif's effects on B56 are directly linked to the mitotic arrest phenotype is an important direction for future research. Equipped with our advanced imaging tools, we are now preparing to extend our studies to include Vif variants from additional HIV-1 subtypes, including primary isolates. As you rightly pointed out, depletion of B56 is expected to be challenging as the B56 family comprises multiple isoforms, each with distinct and partially redundant roles in mitosis, particularly in microtubule assembly and spindle assembly checkpoint regulation. The functions of PP2A-B56 in mitosis are well-documented compared to the relatively new studies on Vif’s role in PP2A-B56 degradation. In human cells, the B56 family comprises 5 isoforms (B56α, B56β, B56γ, B56δ, and B56ε). While all B56 isoforms localize to kinetochores or centromeres during early mitosis, the reasons for their slightly different localization patterns (to either kinetochores or centromeres) remain unclear (Vallardi et al., eLife, 2019). Notably, these isoforms exhibit functional redundancy; thus, the depletion of any single isoform does not result in severe mitotic defects (Foley et al., Nature Cell Biology, 2011; Neumann et al., Nature, 2010). Supporting this redundancy, the overexpression of a single isoform (tested only B56α and B56γ) can rescue kinetochore function when all other isoforms are depleted (Foley et al., Nature Cell Biology, 2011; Vallardi et al., eLife, 2019). This complexity poses significant challenges to modulating the relative levels of individual B56 isoforms experimentally. While these specific experiments are beyond the current scope of our study, we remain committed to advancing our understanding of the mechanisms driving Vif-induced pseudo-metaphase arrest. Your suggestion aligns with our ongoing efforts, and we will consider these experiments as we further explore this fascinating area.

      Where I would raise some criticism is in the relevance of these observations to the replication and pathogenesis of the virus itself, which the authors do not address or discuss. Firstly, despite clear data that both Vpr and Vif can lead to a cell cycle arrest in cycling cells, it has never been particularly clear why the virus does this. While I would agree with the authors that Vif results in the metaphase arrest through targeting B56-PP2A, this may not be the reason WHY the virus targets one of the cell's major phosphatases, but rather a knock-on effect of doing so. I appreciate that this is beyond the scope of the study, but it is something I feel should be discussed rather than the narrow mechanistic points made in the discussion. Secondly, the authors suggest that this activity of Vif is a major cause of apoptosis in infected cells and perhaps CD4+ T cell depletion in vivo. It would be good to quantify how much apoptosis is Vif-dependent in infected primary human CD4+ T cells rather than transformed tumor cells, and whether this correlates with the Vif-mediated induction of a pseudometaphase.

      Thank you for highlighting this important point. We completely agree that the full scope of Vif’s bi-functional roles, in both degrading the APOBEC3 family, which is essential for HIV-1 infection, and inducing cell cycle arrest, is not yet fully understood. The connection between Vif’s role in cell cycle arrest and the HIV-1 life cycle remains unclear. One possible explanation, as discussed in our study, is that Vif-induced pseudo-metaphase arrest may contribute to cell death, suggesting that Vif could play a role in the reduction of CD4+ T cells. Alternatively, Vif’s impact on cell cycle arrest, or its disruption of phosphatase activity, could facilitate HIV-1 virus production. However, further experiments, especially using primary human CD4+ T cells with similar approaches as in this study, are essential to gain deeper insights. This discussion has been included in the Limitations section of our study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The first paragraph of the Introduction is not necessary and anyway is quite outdated about the current state of HIV pathogenesis. Likewise, the discussion implies that HIV pathogenesis is due to virally-induced cell death, which is also outdated by more than a decade of work demonstrating that chronic immune activation is the driver of CD4 cell decline rather than direct cytotoxicity due to viral proteins.

      We have revised the first paragraph of the Introduction.

      (2) Line 134. I do not know what are Cal51 cells, and why they are being used for an HIV study here. Some rationale for being the cell of choice for this study should be included.

      Thank you for this suggestion. We have revised the text to clearly articulate the rationale for selecting the Cal51 cell line in this study. Briefly, this study focuses on the robust mitotic arrest induced by Vif. To capture this phenomenon, long-term live-cell imaging was required with a range of 48–120 hours, with imaging intervals of 6–12 minutes and 3–4 z-stacks per time point. These parameters presented considerable technical challenges. The Cal51 cell line was chosen as it has been genetically engineered by the CRISPR-Cas9 method to express mScarlet-tagged Histone H2B and mNeonGreen-tagged Tubulin, enabling extended live-cell imaging. Furthermore, the Cal51 cell line exhibits wild-type p53 expression and maintains a stable near-diploid karyotype, making it an ideal model for studying cell cycle progression.

      (3) A description of the viruses being used is necessary. Although the authors cite a previous paper, the names in that paper do not exactly match the names used here. I presume that is the NL4.3 strain?

      Thank you for raising this important point. We utilized the B type HIV-1 NL4-3 strain in our study and have revised the manuscript to specify this detail.

    1. eLife Assessment

      This study reports important findings about pre-saccadic foveal prediction and the extent to which it is influenced by the visibility of the saccade target relative to its background. The results and research methodology are solid, although the neural substrates of oscillatory pre-saccadic enhancement for high-opacity targets remain unclear. This work should be of broad interest to visual neuroscientists, as well as those aiming to understand perception in the context of eye movements and modeling of visually guided actions.

    2. Reviewer #1 (Public review):

      Summary:

      This study examines to what extent this phenomenon varies based on the visibility of the saccade target. Visibility is defined as the contrast level of the target with respect to the noise background, and it is related to the signal-to-noise ratio of the target. A more visible target facilitates the oculomotor behavior planning and execution, however, as speculated by the authors, it can also benefit foveal prediction even if the foveal stimulus visibility is maintained constant. Remarkably, the authors show that presenting a highly visible saccade target is beneficial for foveal vision as detection of stimuli with an orientation similar to that of the saccade target is improved, the lower is the saccade target visibility, the less prominent is this effect.

      Strengths:

      The results are convincing and the research methodology is technically sound.

      Weaknesses:

      It is still unclear why the pre-saccadic enhancement would oscillate for targets with higher opacity levels, and what would be the benefit of this oscillatory pattern. The authors do not speculate too much on this and loosely relate it to feedback processes, which are characterized by neural oscillations in a similar range.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors ran a dual task. Subjects monitored a peripheral location for a target onset (to generate a saccade to), and they also monitored a foveal location for a foveal probe. The foveal probe could be congruent or incongruent with the orientation of the peripheral target. In this study, the authors manipulated the conspicuity of the peripheral target, and they saw changes in performance in the foveal task. However, the changes were somewhat counterintuitive.

      Strengths:

      The authors use solid analysis methods and careful experimental design.

      Comments on revisions:

      The authors have addressed my previous comments.

      One minor thing is that I am confused by their assertion that there was no smoothing in the manuscript (other than the newly added time course analysis). Figure 3A and Figure 6 seem to have smoothing to me.

      Another minor comment is related to the comment of Reviewer 1 about oscillations. Another possible reason for what looks like oscillations is saccadic inhibition. when the foveal probe appears, it can reset the saccade generation process. when aligned to saccade onset, this appears like a characteristic change in different parameters that is time-locked to saccade onset (about a 100 ms earlier). So, maybe the apparent oscillation is a manifestation of such resetting and it's not really an oscillation. so, I agree with Reviewer 1 about removing the oscillation sentence from the abstract.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Reviews):

      Summary:

      This study examines to what extent this phenomenon varies based on the visibility of the saccade target. Visibility is defined as the contrast level of the target with respect to the noise background, and it is related to the signal-to-noise ratio of the target. A more visible target facilitates the oculomotor behavior planning and execution, however, as speculated by the authors, it can also benefit foveal prediction even if the foveal stimulus visibility is maintained constant. Remarkably, the authors show that presenting a highly visible saccade target is beneficial for foveal vision as the detection of stimuli with an orientation similar to that of the saccade target is improved, the lower the saccade target visibility, the less prominent the effect.  

      Strengths:

      The results are convincing and the research methodology is technically sound.

      Weaknesses:

      Discussion on how this phenomenon may unfold in natural viewing conditions when the foveal and saccade target stimuli are complex and are constituted by different visual properties is lacking. Some speculations regarding feedforward vs feedback neural processing involved in the phenomenon and the speed of the feedforward signal in relation to the visibility of the target, are not well justified and not clearly supported by the data.

      We thank the reviewer for their comment. In general, we tried to address conceptual points only briefly in this Research Advance if we had discussed them in depth in our main article which this advance will be linked to (Kroell & Rolfs, 2022: https://elifesciences.org/articles/78106). However, the reviews showed us that this rendered our theoretical reasoning in the current manuscript appear incomplete. In the revised Discussion section, we have elaborated on several conceptual questions. In particular, we expand on the transferability of our findings to natural viewing conditions:

      “Foveal prediction in natural visual environments

      As noted above, human observers typically move their eyes towards the most conspicuous objects in their environment (‘t Hart, Schmidt, Roth, & Einhäuser, 2013). Foveal prediction seems to benefit from this strategy as the strength of the predicted signal increases with the conspicuity of the eye movement target. Nonetheless, natural visual environments as well as naturalistic viewing behavior pose several challenges for the foveal prediction mechanism (see Kroell & Rolfs, 2022, for an initial discussion). 

      First, naturalistic saccade target stimuli will likely exhibit complex shapes and, more often than not, will include feature conjunctions rather than isolated features. Previous findings suggest that the foveal feedback mechanism is capable of operating at this level of complexity: High-level peripheral information such as the category of novel, rendered objects (Williams et al., 2008) has been successfully decoded from activation in foveal retinotopic cortex. If, indeed, temporal objectspecific areas such as area TE send feedback, the foveal prediction mechanism may even be specialized for the transfer of complex visual properties.

      Second, foveal input will often be of high contrast in natural visual environments. If fed-back predictive signals can influence foveal perception in the presence of high-contrast feedforward input remains to be established. In our main investigation (Kroell & Rolfs, 2022; Figure 2B) as well as in previous studies (Hanning & Deubel, 2022b), pre-saccadic foveal detection performance decreased markedly in the course of saccade preparation, presumably because visuospatial attention gradually shifted towards the saccade target and away from the foveal location. This presaccadic decrease in foveal sensitivity may boost the relative weight of fed-back signals by attenuating the conspicuity of high-contrast feedforward input. In other words, the strength of feedforward input to the fovea is reduced gradually across saccade preparation. At the same time, the strength of the fed-back predictive signal should profit from the high contrast of naturalistic saccade targets.

      Third, while foveal and peripheral information was congruent on 50% of all ‘probe present’ trials in our investigation, peripheral and foveal features will often be weakly correlated or even uncorrelated in natural environments (see Samonds, Geisler, & Priebe, 2018). Again, the presaccadic attenuation of foveal feedforward processing may allow fed-back peripheral signals to influence perception even if they are uncorrelated with foveal information. Moreover, in piloting variations of our paradigm, we observed that the subjective impression of perceiving the saccade target at the pre-saccadic foveal location is most pronounced if the foveal noise region is replaced with a black Gaussian blob at certain time points before saccade onset (unpublished phenomenological accounts). In consequence, fed-back signals do not seem to require correlated feedforward input to influence perception. Quantitative evidence, however, remains to be established.

      Lastly, pre-saccadic foveal input is likely less relevant during natural viewing behavior than it is in our task. It is possible that this task-induced prioritization of the foveal location facilitated the emergence of congruency effects. In a previous experiment (Kroell & Rolfs, 2022; Figure 1D), however, the perceptual probe could appear anywhere on a horizontal axis of 9 dva length around the fixation location. Despite this spatial unpredictability, congruency effects peaked at the presaccadic foveal location, even after peripheral baseline performances had been raised to a foveal level through an adaptive increase in probe opacity. On a similar note, the orientation of the saccade target is irrelevant to the behavioral task in our design, mirroring naturalistic situations: The eye movement can be planned and executed based on local contrast variations alone, and observers are never required to report on the orientation of the peripheral target stimulus. Ultimately, however, an influence of task demands on visual processing can only be fully excluded through techniques that provide a direct readout of perceptual contents without requiring overt responses. In psychophysical investigations, a prediction of saccade target motion may be read out from observers’ eye velocities (Kroell, Mitchell, & Rolfs, 2023; Kwon, Rolfs, & Mitchell, 2019). In electroencephalographic (EEG) and electrophysiological studies, foveal predictions should manifest in early visually evoked potentials (e.g., Creel, 2019) and increased firing rates of featureselective foveal neurons in early visual areas, respectively. In conclusion, previous findings (Williams et al., 2008), the assumed properties of the neuronal feedback mechanism (Williams et al., 2008; Bullier, 2001) and characteristics of our current and previous experimental paradigms collectively suggest that foveal feature predictions are likely to transfer to naturalistic environments and viewing situations. Experimental evidence remains to be established.”

      We have furthermore modified the Abstract to emphasize the connection of the current manuscript to the main article.

      With respect to the reviewer’s point that “speculations regarding feedforward vs feedback neural processing involved in the phenomenon and the speed of the feedforward signal in relation to the visibility of the target, are not well justified”: 

      Again, we understand that we should have elaborated on our theoretical reasoning in this Research Advance. The assumption that our initial findings rely on neuronal feedback to foveal retinotopic cortex is derived from Williams et al.’s (2008) seminal findings: In an fMRI study, the category of peripherally presented objects could be decoded from voxels in foveal retinotopic cortex, suggesting that peripheral visual information was available to neurons with strictly foveal receptive fields. We extended these findings to saccade preparation, suggesting that feedback from higher-order, non-retinotopically organized visual areas may transmit information without the requirement of efference copies (see Kroell, 2023; Dissertation; https://doi.org/10.18452/27204, pp. 54-59): Irrespective of the vector of the upcoming saccade, the features of the attended saccade target would invariably be relayed to foveal retinotopic cortex. Ultimately, only anatomical and functional studies in non-human primates can conclusively establish the role of feedback connections in the observed foveal prediction effects. At present, however, this parsimonious model could account for all of our current and previous findings, that is, a temporally, spatially and feature-specific anticipation of saccade target properties in the presaccadic center of gaze. Nonetheless, we are open to considering any other mechanism that may account for our findings, and have integrated the explanation provided by the reviewer into the paragraph on potential thalamic mechanisms (see the reviewer’s Major Point 1).

      Concerning the point that the “some speculations regarding feedforward vs feedback neural processing […] and the speed of the feedforward signal in relation to the visibility of the target are not well justified and not clearly supported by the data”: 

      Theoretical considerations on the impact of peripheral target contrast on feedforward processing speed were a main motivation for the current study. We apologize if our theoretical reasoning was incomplete and have added additional references and elaborations to the Introduction: 

      “In particular, neuronal response latencies decrease systematically as the contrast of visual input increases. While this phenomenon is reliably observed at varying stages of the visual processing hierarchy—such as the lateral geniculate nucleus (Lee, Elepfandt, & Virsu, 1981b), primary visual cortex (e.g., Albrecht, 1995; Carandini & Heeger, 1994; Carandini, Heeger, & Movshon, 1997; Carandini, Heeger, & Senn, 2002), and anterior superior temporal sulcus (STSa; Oram, Xiao, Dritschel, & Payne, 2002; van Rossum, van der Meer, Xiao, & Oram, 2008)—influences of contrast on neuronal response latency are particularly pronounced in higher-order visual areas: A doubling of stimulus contrast has been shown to decrease the latency of V1 neurons by 8 ms, compared to a reduction of 33 ms in area STSa (Oram et al., 2002; van Rossum et al., 2008). Assuming that the peripheral target is processed in a bottom-up fashion until it reaches higher-order object processing areas, the time point at which peripheral signals are available for feedback should be dictated by the temporal dynamics of visual feedforward processing.”

      Concerning the interpretation of the observed time courses, and regarding the reviewer’s Major points 3 & 6, we substantially revised the Results and Discussion section. In brief, we deemphasized the claim/interpretation of faster enhancement with increasing target opacity and instead focus on describing the oscillatory pattern mentioned by the reviewer. We provide a more temporally resolved pre-saccadic time course using a moving-window analysis and discuss all suggested and further alternative explanations (i.e., saccade-locked perceptual or attentional oscillations, longer signal accumulation intervals for low-contrast information, oscillatory nature of feedback signaling). Details and full revised paragraphs are provided in the response to this reviewer’s Major points 3 & 6.

      Unfortunately, there is no line numbering in the manuscript version I downloaded so I cannot refer to the specific lines of text here.

      We apologize for the inconvenience and have added line numbers.

      Major:

      (1) The authors speculate that the phenomenon of pre-saccadic foveal prediction arises from feedback connections from higher-order visual areas, which relay relevant saccade target features to the foveal retinotopic cortex. These feedback signals are then presumably combined with feedforward foveal input to the early visual cortex and facilitate the detection of target-congruent features at the center of gaze. This interpretation is sensible, however, it may not be the only plausible scenario. The thalamus receives copies of feedforward and feedback connections between all visual areas and is a likely candidate hub for combining information across visual space. In this latter case, the phenomenon of pre-saccadic foveal prediction may not arise from feedback from higher-order visual areas, but rather from a combination of signals occurring at the level of the thalamus. The authors should either acknowledge this possibility and the fact that this phenomenon is not necessarily the result of a feedback loop, or they should explain their rationale for excluding this scenario.

      We thank the reviewer for their highly thoughtful suggestion, and for alerting us to relevant literature. We have added the following paragraph to the Discussion section. In brief, we discuss the thalamic pulvinar as either an intermediate modulatory region or as the final receiver of the fed-back signal. Yet, we assume that—to solve the combinatorial issue associated with a transfer of feature information before saccades with any possible direction and amplitude—the contribution of non-retinotopic, higherorder object processing areas is likely required. 

      “Neural implementation of foveal prediction

      Based on the body of our findings as well as previous literature, we suggested a parsimonious feedback mechanism to underly the observed effects: the preparation of a saccadic eye movement, and the concomitant shift of pre-saccadic attention (e.g., Kowler, Anderson, Dosher, & Blaser, 1995; Deubel & Schneider, 1996), selects the peripheral target stimulus among competing information. Higher-order visual areas feed selected feature input back to early retinotopic areas— specifically, to neurons with foveal receptive fields. Fed-back feature information combines with congruent, foveal feedforward input, resulting in the enhancement effects we observe. Especially in the context of active vision, this feedback mechanism is appealing as it resolves a combinatorial issue associated with feature-specific information transfer before saccades. Consider a simplified case in which, right before a saccadic eye movement, the activation of a feature-selective neuron that encodes a certain retinal location is transferred to a neuron within the same brain area that will encode said retinal location after saccade landing. For this mechanism to function for any possible saccade direction and amplitude, most neurons would need to be connected to most other neurons (or, in a simplified version, to neurons with foveal receptive fields) in a given brain area. Assuming an information transmission via feedback rather than horizontal connections significantly reduces this dimensionality: Higher-order visual areas that encode object properties (largely) detached from retinotopic or spatiotopic reference frames selectively transfer feature information to neurons with foveal receptive fields, irrespective of the vector of the upcoming saccade. This parsimonious mechanism would have shortcomings. In particular, foveal feedback should become less effective during saccade sequences where several peripheral targets are simultaneously attended. Feature information at both attended target locations may be fed back in temporal succession or weighted and erroneously combined into a single fed-back signal. In most cases, however, foveal feedback may reasonably achieve what established transsaccadic mechanisms struggle to explain: An anticipation of the features of a single saccade target—which typically constitutes the currently most relevant object in the visual field—in foveal vision. 

      While direct feedback connections from higher-order to early visual areas would constitute the most straightforward implementation, it is conceivable that feedback signals are relayed through and modulated by subcortical areas. In particular, the thalamic pulvinar has been identified as a connection hub for visual processing that receives copies of feedforward and feedback connections from different visual areas and may even combine information across visual space (Cortes, Ladret, Abbas-Farishta, & Casanova, 2024). In the case of foveal prediction, thalamic neurons may receive fed-back signals from higher-order areas and enhance those signals before passing them on to cortical neurons with foveal receptive fields. Perhaps, a modification of foveal activation within the thalamic pulvinar itself is sufficient to influence perception. To the best of our understanding, however, the fed-back signal must originate in non-retinotopic, higher-order object processing areas to reduce the number of necessary neuronal connections.”

      (2) The results presented are very compelling. I wonder to which extent they generalize to situations in which the foveal input and the peripheral input are more heterogenous (e.g., faces or complex objects composed of many different features, orientations, and other visual properties). I think the current research raises a number of interesting questions. In general, it would be important for the readers to elaborate more on how the mechanism of pre-saccadic foveal prediction may play out in normal viewing conditions or in conditions in which the foveal input is completely irrelevant to the task.

      We agree and have reiterated this point in the current manuscript (see our first reply to “Weaknesses”). We also explicitly refer to Kroell & Rolfs (2022) for an extensive initial discussion of this question.

      (3) On page 10 the authors state that their data suggest that foveal enhancement emerges in earlier stages of saccade preparation as target opacity increases. However, this is not clear from the figures, when performance is locked to saccade onset (Fig 3 C), for the highest opacity targets performance seems to oscillate, however, the authors do not comment on that. There is literature showing how saccades can reset perceptual oscillations, and maybe what is observed here is just a stronger performance oscillation when the saccade target is more visible. Why would performance drop systematically 75 ms before saccade onset and then increase again 25 ms before the onset? Can the authors elaborate more on this?

      In response to this comment, we inspected the pre-saccadic time course of enhancement effects in a more temporally resolved fashion and, indeed, observed pronounced oscillations for the two higher target opacity conditions (see Results): 

      “Especially at higher target opacities, the temporal development of foveal enhancement appears to exhibit an oscillatory pattern. To inspect this incidental observation in a more temporally resolved fashion, we determined mean enhancement values in a boxcar window of 50 ms duration sliding along all saccade-locked probe offset time points (step size = 10 ms; x-axis values in Figure 4 indicate the latest time point in a certain window). We then fitted 6th order polynomials (with no constraints on parameters) to the resulting time courses and compared the fitted values against zero using bootstrapping (see Methods). The average foveal enhancement across target opacities reached significance starting 115 ms before saccade onset (gray curve in Figure 4; all ps < .046). For every individual target opacity condition, we observed significant enhancement immediately before saccade onset, although only very briefly for the lowest opacity (-2–0 ms for 25%; -39–0 ms for 39%, -106–0 ms for 59% &  -13–0 ms for 90%; all ps < .050; yellow to dark red curves in Figure 4). Especially for the higher two target opacities, we observed a local maximum preceding eye movement onset by approximately 80 ms. Interestingly, assuming a peak in enhancement in approximately 80 ms intervals (i.e., at x-axis values of -80 and 0 ms in Figure 4) would correspond to an oscillation frequency of 12.5 Hz. In contrast to rapid feedforward processing, feedback signaling is associated with neural oscillations in the alpha and beta range (i.e., between 7 and 30 Hz; Bastos et al., 2015; Jensen, Bonnefond, Marshall, & Tiesinga, 2015; van Kerkoerle et al., 2015).”

      We had observed an oscillatory pattern in multiple previous investigations, and in both Hit Rates to foveal orientation content and reflexive gaze velocities in response to peripheral motion information. So far, we have been unsure how to explain it. The literature on thalamic visual processing mentioned by the reviewer alerted us to the oscillatory nature of feedback signaling itself. Interestingly, the temporal frequency range of feedback oscillations includes the frequency of ~12.5 Hz observed in our data. We have included this and alternative explanations in the Discussion section (see below). Throughout, we highlight that we are aware that our analysis approach is purely descriptive and that the potential explanations we give are speculative.

      “Moreover, foveal congruency effects appear to exhibit an oscillatory pattern, with peaks in a medium saccade preparation stage (~80 ms before the eye movement) and immediately before saccade onset. We have noticed this pattern in several investigations with substantially different visual stimuli and behavioral readouts. For instance, using a full-screen dot motion paradigm, we observed a pre-saccadic, small-gain ocular following response to coherent motion in the saccade target region (Kroell, Rolfs, & Mitchell, 2023, conference abstract; Kroell, 2023, dissertation). Predictive ocular following first reached significance ~125 ms before the eye movement, then decreased and subsequently ramped up again ~25 ms before saccade onset. Several explanatory mechanisms appear conceivable. Unlike rapid feedforward processing, feedback propagation has been shown to follow an oscillatory rhythm in the alpha and beta range, that is, between 7 and 30 Hz (Bastos et al., 2015; Jensen, Bonnefond, Marshall, & Tiesinga, 2015; van Kerkoerle, et al., 2015). In our case, it is possible that the object-processing areas that send feedback to retinotopic visual cortex do so at a temporal frequency of ~12.5 Hz. At higher stimulus contrasts, feedforward signals may be fed back instantaneously and without the need for signal accumulation in feedbackgenerating areas. The resulting perceptual time courses may reflect innate temporal feedback properties most veridically. Alternatively, the initial enhancement peak may be related to the sudden onset of the saccade target stimulus and not to movement preparation itself. In this case, the initial peak should become particularly apparent if enhancement is aligned to the onset of the target stimulus. Yet, Figure 3 and Figure 4 suggest more prominent oscillations in saccade-locked time courses. In accordance with this, perceptual and attentional processes have been shown to exhibit oscillatory modulations that are phase-locked to action onset (e.g., Tomassini, Spinelli, Jacono, Sandini, & Morrone, 2015; Hogendoorn, 2016; Wutz, Muschter, van Koningsbruggen, Weisz, & Melcher, 2016; Benedetto & Morrone, 2017; Tomassini, Ambrogioni, Medendorp, & Maris, 2017; Benedetto, Morrone & Tomassini, 2019). Whether the oscillatory pattern of foveal enhancement, as well as its increased prominence at higher target contrasts, relies on innate temporal properties of feedback signaling, signal accumulation, saccade-locked oscillatory modulations of feedforward processing or attention, or a combination of these factors, one conclusion remains: task-induced cognitive influences suggested to underlie the considerable variability in temporal characteristics of foveal feedback during passive fixation (e.g., Fan et al., 2016; Weldon et al., 2016; 2020) are not the only possible explanation. Low-level target properties such as its luminance contrast modulate the resulting time course and should be equally considered, at least in our paradigm.”

      In the revised Abstract, we removed our claim on an earlier emergence of enhancement at higher opacities and have added this summary instead:

      “Second, the time course of foveal enhancement appeared to show an oscillatory pattern that was particularly pronounced at higher target opacities. Interestingly, the temporal frequency of these oscillations corresponded to the frequency range typically associated with neural feedback signaling.”

      (4) What was the average difference in latency between short and long latencies? It would be good to report it in the main text.

      We apologize for the oversight. The difference was 61 ms, with latencies of md = 247±18 ms for short- and md = 308±18 ms for long-latency saccades. We have added this information to the main text.

      (5) From the saccade latency graphs in Figure S1 it seems there is some variability in the latency of saccades across subjects, I wonder if there is a correlation between saccade latency and the magnitude of the foveal prediction effect across subjects.

      We had inspected a connection between saccade latency and congruency in our first investigation (Kroell & Rolfs, 2022; not reported) and observed that participants with lower latencies tended to show more enhancement, albeit non-significantly. Likewise, we observed a non-significant negative correlation between the median saccade latency and the mean foveal prediction effect (across opacities and time points) in the current investigation, r \= -0.22, p \= .572. While our study involved a small number of observers (n = 9), the analysis approach illustrated in Figure 2 A-C instead makes use of the large number of trials collected per participant (mean n = 2841 trials per observer) and demonstrates a reliable influence of saccade latency on an individual-observer level.

      (6) Page 14, the authors state that their findings suggest that the feedforward processing of the peripheral saccade target is accelerated when it is presented at high contrast. I find this a bit too speculative, both in terms of assuming that there is a feedforward vs a feedback process (see my point 1) and in terms of speculating that the feedforward process is accelerated as I do not see a clear hint of this in the data (see my point 3) and it is a bit of a stretch to speculate on delays or accelerations of neural processing. It is possible that the feedforward signal is always delivered at the same speed but it is weaker in one case and the effect needs more time to build up.

      We fully agree and hope to have addressed the reviewer’s arguments in the sections preceding this point. We included the reviewer’s last sentence in the Discussion section as well: 

      “Alternatively, or in addition, it is conceivable that weaker feedforward signals require a longer accumulation interval before the feedback process can be initiated.”

      Minor:

      (1) I think the description of the linear mixed-effects model can go in the supplemental methods, if possible, and its results can be briefly mentioned in the text.

      In previous work, we have been asked to move linear mixed-effects model descriptions from supplemental to main method (or even results) sections for clarity. We have followed this suggestion ever since and, due to the relevance of the models for the interpretation of the presented results, would like to keep their description in the methods section.

      (2) This is just a minor point, but I would suggest using a different word instead of opacity (maybe visibility?).

      We had gone back and forth on this. We decided to use the term ‘conspicuity’ when we discuss our findings conceptually and the term ‘opacity’ when we refer to the experimental manipulation (since we directly manipulate the transparency, i.e., 1-opacity, of the target patch against the background). To compute the slopes in Figures 2 and 5, we ordered observers’ performances by the linearly spaced opacity conditions. Since the term ‘opacity’ is closest to both the experimental manipulation and the variable entered into analysis, we would like to adhere to this terminology. However, we have added an explicit note to the end of our introduction to avoid confusion: 

      “Throughout the paper, we use the term ‘opacity’ when we refer to the experimental manipulation (that is, a variation of the transparency, i.e., 1-opacity of the target patch against the background noise) and the term ‘conspicuity’ when we discuss our findings conceptually.”

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors ran a dual task. Subjects monitored a peripheral location for a target onset (to generate a saccade to), and they also monitored a foveal location for a foveal probe. The foveal probe could be congruent or incongruent with the orientation of the peripheral target. In this study, the authors manipulated the conspicuity of the peripheral target, and they saw changes in performance in the foveal task. However, the changes were somewhat counterintuitive.

      Strengths:

      The authors use solid analysis methods and careful experimental design.

      Weaknesses:

      I have some issues with the interpretation of the results, as explained below. In general, I feel that a lot of effects are being explained by attention and target-probe onset asynchrony etc, but this seems to be against the idea put forth by the authors of "foveal prediction for visual continuity across saccades". Why would foveal prediction be so dependent on such other processes? This needs to be better clarified and justified.

      We address the described weaknesses in the respective sections below. In general, as we point out in response to Reviewer 1 as well, the current submission is a Research Advance article meant to supplement our main article (Kroell & Rolfs, 2022, https://doi.org/10.7554/eLife.78106). To comply with the eLife recommendations for Research Advance submissions, we addressed conceptual points only briefly, especially if they had been explained in detail in our main article. To make the nature and format of the current submission as explicit as possible, and to emphasize its connection to our previous work, we refer to the submission format in our abstract and introduction now.

      Specifics:

      The explanation of decreased hit rates with increased peripheral target opacity is not convincing. The authors suggest that higher contrast stimuli in the periphery attract attention. But, then, why are the foveal results occurring earlier (as per the later descriptions in the manuscript)? And, more importantly, why would foveal prediction need to be weaker with stronger pre-saccadic attention to the periphery? What is the function of foveal prediction? What of the other interpretation that could be invoked in general for this type of task used by the authors: that the dual task is challenging and that subjects somehow misattribute what they saw in the peripheral task when planning the saccade. i.e. foveal hit rates are misperceptions of the peripheral target. When the peripheral target is easier to see, then the foveal hit rate drops.

      We will address these comments one by one:

      The authors suggest that higher contrast stimuli in the periphery attract attention. But, then, why are the foveal results occurring earlier (as per the later descriptions in the manuscript)?

      We consider these observations to rely on separate processes. Already in the main publication (Kroell & Rolfs, 2022), we had observed a continuous decrease of target-congruent and target-incongruent foveal Hit Rates (HRs) during saccade preparation, and suggested that this decrease (similarly observed in Hanning & Deubel, 2022b is likely caused by the pre-saccadic shift of visuospatial attention to the target. In other words, as attentional resources shift towards the periphery, foveal detection performance is hampered, irrespective of peripheral and foveal feature (in-)congruency. In the current investigation, we again observed a pronounced pre-saccadic decrease of foveal HRs, irrespective of foveal probe orientation. Our argument that high-contrast peripheral saccade targets attract more attention relies on the clear observation that this decrease becomes more pronounced as the contrast of the saccade target increases. To the best of our judgment and experience with doing the task ourselves, this interpretation appears very conceivable. We explain this rationale in the Abstract and the Results sections of the manuscript (see below).

      Our hypotheses and interpretations concerning the time course of foveal prediction refer to the difference between target-congruent and target-incongruent foveal HRs (i.e., to predictive foveal feature enhancement). Irrespective of the general, feature-unspecific decrease of foveal detection performances, we had hypothesized that the peripheral target is processed faster if it exhibits a high contrast. This assumption is based on temporal processing properties of many visual neurons that we have expanded on in our revision: 

      “In particular, neuronal response latencies decrease systematically as the contrast of visual input increases. While this phenomenon is reliably observed at varying stages of the visual processing hierarchy—such as the lateral geniculate nucleus (Lee et al., 1981b), primary visual cortex (e.g., Albrecht, 1995; Carandini et al., 1997, 2002; Carandini and Heeger, 1994), and anterior superior temporal sulcus (STSa; Oram et al., 2002; van Rossum et al., 2008)— influences of contrast on neuronal response latency are particularly pronounced in higher-order visual areas: A doubling of stimulus contrast has been shown to decrease the latency of V1 neurons by 8 ms, compared to a reduction of 33 ms in area STSa (Oram et al., 2002; van Rossum et al., 2008). Assuming that the peripheral target is processed in a bottom-up fashion until it reaches higher-order object processing areas, the time point at which peripheral signals are available for feedback should be dictated by the temporal dynamics of visual feedforward processing.”

      Of note, both reviewers asked us to explore the oscillatory nature of the difference between targetcongruent and target-incongruent HRs. We will post our changes in response to the reviewer’s remark below.

      And, more importantly, why would foveal prediction need to be weaker with stronger pre-saccadic attention to the periphery?

      We hope that our previous reply has cleared up that the opposite is true: In general, and irrespective of the feature congruency of target and foveal probe, foveal HRs decrease as target contrast increases. As we have stated in our Abstract and Results, “foveal Hit Rates for target-congruent and incongruent probes decreased as target opacity increased, presumably since attention was increasingly drawn to the target the more salient it became. Crucially, foveal enhancement defined as the difference between congruent and incongruent Hit Rates increased with opacity”. This finding did not appear counterintuitive to us and was, in fact pre-registered as a main hypothesis (see https://osf.io/wceba). 

      We are unsure if this goes beyond the reviewer’s concern but we, in fact, speculate in the revised Discussion section as well as in our original eLife article that the overall, feature-unspecific decrease in foveal detection performances may aid feature-specific foveal prediction: 

      “This pre-saccadic decrease in foveal sensitivity may boost the relative weight of fed-back signals by attenuating the conspicuity of high-contrast feedforward input. In other words, the strength of feedforward input to the fovea is reduced gradually across saccade preparation. At the same time, the strength of the fed-back predictive signal should profit from the high contrast of naturalistic saccade targets.”

      What is the function of foveal prediction?

      Please refer to the section ‘What is the function of foveal prediction?’ in our main article. We have pasted this paragraph below for the reviewer’s convenience. 

      “What is the function of foveal prediction?

      As stated above, previous investigations on foveal feedback required observers to make peripheral discrimination judgments. We, in contrast, did not ask observers to generate a perceptual judgment on the orientation of the saccade target. Instead, detecting the target was necessary to perform the oculomotor task. While the identification of local contrast changes would have sufficed to direct the eye movement, the orientation of the target enhanced foveal processing of congruent orientations. The automatic nature of foveal enhancement showcases that perceptual and oculomotor processing are tightly intertwined in active visual settings: planning an eye movement appears to prioritize the features of its target; commencing the processing of these features before the eye movement is executed may accelerate post- saccadic target identification and ultimately provide a head start for corrective gaze behavior (Deubel et al., 1982; Ohl and Kliegl, 2016; Tian et al., 2013).”

      What of the other interpretation that could be invoked in general for this type of task used by the authors: that the dual task is challenging and that subjects somehow misattribute what they saw in the peripheral task when planning the saccade. i.e. foveal hit rates are misperceptions of the peripheral target. When the peripheral target is easier to see, then the foveal hit rate drops.

      Alternative explanations in general: In our main article, we ruled out—either through direct experimentation or by considering relevant properties of our findings—the following alternative explanations: i) spatially global feature-based attention to the target orientation, ii) a multiplicative combination of spatial and feature-based attention, and iii) shifts of decision criterion. While dual tasks (i.e., simultaneous oculomotor planning and perceptual detection) are standard in psychophysical investigations of active vision, we acknowledge the potential influence of an explicit foveal task in the revised manuscript, and in response to both reviewers: 

      “Lastly, pre-saccadic foveal input is likely less relevant during natural viewing behavior than it is in our task. It is possible that this task-induced prioritization of the foveal location facilitated the emergence of congruency effects. In a previous experiment (Kroell & Rolfs, 2022; Figure 2D), the perceptual probe could appear anywhere on a horizontal axis of 9 dva length around the screen center. Despite this spatial unpredictability, however, congruency effects peaked at the pre-saccadic foveal location, even after peripheral baseline performances had been raised to a foveal level through an adaptive increase in probe opacity. Ultimately, an influence of task demands on visual processing can only be fully excluded through techniques that provide a direct readout of perceptual contents without requiring keyboard responses. In psychophysical investigations, a prediction of saccade target motion may be read out from observers’ eye velocities (Kroell, Mitchell, & Rolfs, 2023; Kwon, Rolfs, & Mitchell, 2019). In electroencephalographic (EEG) and neurophysiological studies, foveal predictions should manifest in early visual evoked potentials (e.g., Creel, 2019) and increased firing rates of feature-selective foveal neurons in early visual areas, respectively.”

      Difficulty of the task: Concerning the perceptual detection task, every experimental session was preceded by an adaptive staircase procedure that adjusted the transparancy of the foveal probe—and, thus, task difficulty—depending on the respective observer’s performance (see Methods for details). Concerning the oculomotor task, observers were able to perform accurate saccades with typical movement latencies for all target opacity conditions (see Results, Supplements & Figure S1). In general, we are unsure how high task difficulty could produce a feature-, temporally and spatially specific enhancement of both filtered and incidental target-congruent foveal orientation information. In fact, a main finding of our current submission is that foveal HRs decrease as the target becomes easier to see and the oculomotor task thus becomes easier to perform.

      Perceptual confusion of target and probe stimulus: We observe a specific increase in HRs for foveal probes that exhibit the same orientation as the peripheral saccade target. Just like in our main article, a response is defined as a ‘Hit’ if a foveal probe is presented and the observer generates a ‘present’ judgment. To our understanding, the suggestion that a confusion of target and probe stimuli may account for these effects necessarily implies that this confusion hinges on the congruency between peripheral and foveal feature inputs. In other words, peripheral and foveal signals should be more readily “confused” if they exhibit similar features. We assume that peripheral feature information is fed back to neurons with foveal receptive field and combines with feature-congruent feedforward input. Whether this combination of signals can be described as low-level perceptual “confusion” likely depends on individual linguistic judgments (it would certainly be a novel description of feedback-feedforward interactions). Perhaps a defining difference between the reviewer’s concern and our assumed mechanism is the spatial specificity of the resulting congruency effects. We suggest that only neurons with foveal receptive fields receive feature information via feedback. And indeed, we demonstrate a clear spatial specificity of congruency effects around the pre-saccadic foveal location, even after parafoveal performances had been raised to a foveal level by an adaptive increase in probe opacity (see Kroell & Rolfs, 2022; Figure 2C & Figure 3). In other words, observers’ perception is altered in their pre-saccadic center of gaze while the target is presented peripherally. We struggle to conceive a

      scenario in which a confusion of signals should be feature-specific as well as specific to an interaction between peripheral and foveal signals without being meaningful at the same time. If the reviewer is referring to confusions on the response or decision level, we would like to point them towards the Discussion section ‘Can our findings be explained by established mechanisms other than foveal prediction?’ in our main article. In this paragraph, we provide detailed arguments for a dissociation between our findings and shifts in decision criterion that would exceed the scope of a Research Advance. 

      When the peripheral target is easier to see, then the foveal hit rate drops.

      We agree. Target-congruent and incongruent foveal HRs decreased as the contrast of the probe increased. However, and as we stated in response to the reviewer’s first comment, the difference between target-congruent and target-incongruent foveal HRs (and, thus, foveal enhancement of the target orientation) increased with peripheral target contrast.

      The analyses of Fig. 3C appear to be overly convoluted. They also imply an acknowledgment by the authors that target-probe temporal difference matters. Doesn't this already negate the idea that the foveal effects are associated with the saccade generation process itself? If the effect is related to target onset, how is it interpreted as related to a foveal prediction that is associated with the saccade itself? 

      We indeed conducted analyses that can reveal an influence of target presentation duration at probe onset, the saccade preparation stage at probe offset, as well as a combination of both factors. The fact that target presentation duration may have an influence on foveal prediction would not negate a simultanous influence of saccade preparation and vice versa. In the main article, we directly investigated the influence of saccade preparation on foveal enhancement by introducing a passive fixation condition (Kroell & Rolfs, 2022; Figure 5). At identical target-probe offset durations, pre-saccadic foveal enhancement was significantly more pronounced and accelerated compared to enhancement during passive fixation. We have added a purely saccade-locked time course (uncorrected by targetprobe interval) to our Results section and to Figure 3 (second row). We still believe that the target-locked, saccade-locked and combined analysis are informative for future investigations and would like to present them all for completeness.

      Also, the oscillatory nature of the effect in Fig. 3C for 59% and 90% opacity is quite confusing and not addressed. The authors simply state that enhancement occurs earlier before the saccade for higher contrasts. But, this is not entirely true. The enhancement emerges then disappears and then emerges again leading up to the saccade. Why would foveal prediction do that?

      In response to this comment and a suggestion by Reviewer 1, we inspected the pre-saccadic time course of enhancement effects in a more temporally resolved fashion and, indeed, observed pronounced oscillations for the two higher target opacity conditions (see Results): 

      “Especially at higher target opacities, the temporal development of foveal enhancement appears to exhibit an oscillatory pattern. To inspect this incidental observation in a more temporally resolved fashion, we determined mean enhancement values in a boxcar window of 50 ms duration sliding along all saccade-locked probe offset time points (step size = 10 ms; x-axis values in Figure 4 indicate the latest time point in a certain window). We then fitted 6th order polynomials to the resulting time courses and compared the fitted values against zero using bootstrapping (see Methods). The average foveal enhancement across target opacities reached significance starting 115 ms before saccade onset (gray curve in Figure 4; all ps < .046). For every individual target opacity condition, we observed significant enhancement immediately before saccade onset, although only very briefly for the lowest opacity (-2–0 ms for 25%; -39–0 ms for 39%, -106–0 ms for 59% &  -13–0 ms for 90%; all ps < .050; yellow to dark red curves in Figure 4). Especially for the higher two target opacities, we observed a local maximum preceding eye movement onset by approximately 80 ms. Interestingly, assuming a peak in enhancement in approximately 80 ms intervals (i.e., at x-axis values of -80 and 0 ms in Figure 4) would correspond to an oscillation frequency of 12.5 Hz. In contrast to rapid feedforward processing, feedback signaling is associated with neural oscillations in the alpha and beta range (i.e., between 7 and 30 Hz; Bastos et al., 2015; Jensen, Bonnefond, Marshall, & Tiesinga, 2015; van Kerkoerle et al., 2015).”

      We had observed an oscillatory pattern in multiple previous investigations, and in both Hit Rates to foveal orientation content and reflexive gaze velocities in response to peripheral motion information. So far, we have been unsure how to explain it. The literature on thalamic visual processing mentioned by the reviewer alerted us to the oscillatory nature of feedback signaling itself. Interestingly, the temporal frequency range of feedback oscillations includes the frequency of ~12.5 Hz observed in our data. We have included this and alternative explanations in the Discussion section (see below). We are aware, and acknowledge in the manuscript, that our analysis approach is purely descriptive, and that the potential explanations we give are speculative. 

      “Moreover, foveal congruency effects appeared to exhibit an oscillatory pattern, with peaks in a medium saccade preparation stage (~80 ms before the eye movement) and immediately before saccade onset. We have noticed this pattern in several investigations with substantially different visual stimuli and behavioral readouts. For instance, using a full-screen dot motion paradigm, we observed a pre-saccadic, small-gain ocular following response to coherent motion in the saccade target region (Kroell, Rolfs, & Mitchell, 2023, conference abstract; Kroell, 2023, dissertation). Predictive ocular following first reached significance ~125 ms before the eye movement, then decreased and subsequently ramped up again ~25 ms before saccade onset. Several explanatory mechanisms appear conceivable. Unlike rapid feedforward processing, feedback propagation has been shown to follow an oscillatory rhythm in the alpha and beta range, that is, between 7 and 30 Hz (Bastos et al., 2015; Jensen, Bonnefond, Marshall, & Tiesinga, 2015; van Kerkoerle, et al., 2015). In our case, it is possible that the object-processing areas that send feedback to retinotopic visual cortex do so at a temporal frequency of ~12.5 Hz. At higher stimulus contrasts, feedforward signals may be fed back instantaneously and without the need for signal accumulation in feedback-generating areas. The resulting perceptual time courses may reflect innate temporal feedback properties most veridically. Alternatively, the initial enhancement peak may be related to the sudden onset of the saccade target stimulus and not to movement preparation itself. In this case, the initial peak should become particularly apparent if enhancement is aligned to the onset of the target stimulus. Yet, Figure 3 and Figure 4 suggest more prominent oscillations in saccade-locked time courses. In accordance with this, perceptual and attention processes have been shown to exhibit oscillatory modulations that are phase-locked to action onset (e.g., Tomassini, Spinelli, Jacono, Sandini, & Morrone, 2015; Hogendoorn, 2016; Wutz, Muschter, van Koningsbruggen, Weisz, & Melcher, 2016; Benedetto & Morrone, 2017; Tomassini, Ambrogioni, Medendorp, & Maris, 2017; Benedetto, Morrone & Tomassini, 2019). Whether the oscillatory pattern of foveal enhancement, as well as its increased prominence at higher target contrasts, relies on innate temporal properties of feedback  signaling, signal accumulation, saccade-locked oscillatory modulations of feedforward processing or attention, or a combination of these factors, one conclusion remains: task-induced cognitive influences suggested to underlie the considerable variability in temporal characteristics of foveal feedback during passive fixation (e.g., Fan et al., 2016; Weldon et al., 2016; 2020) are not the only possible explanation. Low-level target properties such as its luminance contrast modulate the resulting time course and should be equally considered, at least in our paradigm.”

      The interpretation of Fig. 4 is also confusing. Doesn't the longer latency already account for the lapse in attention, such that visual continuity can proceed normally now that the saccade is actually eventually made? In all results, it seems that the effects are all related to the dual nature of the task and/or attention, rather than to the act of making the saccade itself. Why should visual continuity (when a saccade is actually made, whether with short or long latency) have different "fidelity"? And, isn't this disruptive to the whole idea of visual continuity in the first place?

      We are unsure if we grasp the unifying concern behind these remarks. For the reviewer’s point on the dual-task nature of our paradigm, please consider our answer above. Perhaps it is important to note that we do not (and would never) claim that foveal prediction is the only mechanism underlying visual continuity. We believe that multiple mechanisms, including but not limited to pre-saccadic shifts of attention, predictive remapping of attention pointers and the perception of intra-saccadic signals interact and jointly contribute to visual continuity. It appears highly conceivable that, like most processes in biological systems, motor and perceptual performances are subject to fluctuations. We argue that saccade latencies as well as the magnitude of foveal prediction constitute read-outs of these variations. We also suggest that those read-outs are innately correlated beyond their common moderator of, perhaps, attentional state; we have previously presented clear evidence for a link between eye movement preparation and foveal prediciton (Kroell & Rolfs, 2022; Figure 2). To the best of our judgment, we consider it reasonable that the effectiveness of movement-contingent perceptual processes varies with the effectiveness (in programming or execution) of the very movement motivating them. We present evidence for this assumption in our submission. We would also like to make clear that we do not assume our vision to fail entirely, even if every single well-known mechanism of visual continuity were to break down at once. Upon saccade landing, the visual system receives reliable visual input. Nonetheless, the visual system has undeniably developed mechanisms to optimize this process. We believe foveal prediciton to rank among them.

      Small question: is it just me or does the data in general seem to be too excessively smoothed?

      We did not apply any smoothing to either the analysis or visualization of our data in the initial manuscript.

      Every observer completed a large number of trials (mean n = 2841 trials per observer; total trial number > 25,500), which likely contributes to the clarity of our data. To inspect the oscillatory pattern of enhancement in a more temporally resolved fashion (in response to the reviewer’s point above), we applied a moving window analysis in this revision. Due to overlapping window borders, this analysis introduces a certain degree of smoothing. Nonetheless, data patterns are comparable to the time course with only few non-overlapping time bins (Figure 3B; second row). In general, we have described all steps of our analysis routine extensively in the Methods section and will make our data publicly available upon publication of the Reviewed Preprint. 

      General comment: it is important to include line numbers in manuscripts, to help reviewers point to specific parts of the text when writing their comments. Otherwise, the peer review process is rendered unnecessarily complicated for the reviewers.

      We apologize and have added line numbers.

    1. eLife Assessment

      This is an interesting and important paper that grew from a careful clinical assessment of an unusual patient with hypoparathyroidism whose parathyroid glands synthesize and secrete a mutant form of PTH made. This mutant PTH (R25C-PTH), when studied in mice and in vivo, has interesting properties. It can homodimerize and can raise blood calcium and lower blood phosphate levels: the opposite to the human phenotype in the index patient. These investigators perform a careful and convincing comparison of native PTH (1-34), an anabolic drug for osteoporosis treatment, and this R25C-mutant of PTH for their effects on bone mass, strength, microarchitecture, and metabolic activity. This dimeric mutant of PTH has anabolic properties raising the possibility that such forms of PTH could be developed as potent therapies for low bone mass states.

    2. Reviewer #1 (Public review):

      Summary:

      In this work, the authors investigate the functional difference between the most commonly expressed form of PTH, and a novel point mutation in PTH identified in a patient with chronic hypocalcemia and hyperphosphatemia. The value of this mutant form of PTH as a potential anabolic agent for bone is investigated alongside PTH(1-84), which is a current anabolic therapy. The authors have achieved the aims of the study.

      Strengths:

      The work is novel, as it describes the function of a novel, naturally occurring, variant of PTH in terms of its ability to dimerise, to lead to cAMP activation, to increase serum calcium, and its pharmacological action compared to normal PTH.

      [Editors' note: the original reviews are here, https://doi.org/10.7554/eLife.97579.1.sa1, and here, https://doi.org/10.7554/eLife.97579.2.sa1]

    1. eLife Assessment

      In this manuscript the authors established a novel three-dimensional culture system for stratified epithelia that allows epithelial cells to undergo epithelial-to-mesenchymal transition (EMT) and subsequent mesenchymal-to-epithelial transition (MET) while migrating through a membrane with 3.0-µm micropores, and, thus, provides a valuable tool to study EMT and possibly wound regeneration or metastasis. Furthermore, a set of experiments provides solid data suggesting that TGF beta signaling and actin polymerization promote movement of epithelial cells into the pores, while Piezo1 and Keratin 6 prevent keratinocyte migration and EMT.

    2. Reviewer #1 (Public review):

      Summary:

      The study describes the migration of epidermal keratinocytes through porous membranes and observes a unique size selection whereby only on 3-micron membrane are keratinocytes able to migrate and reform an intact epidermis. The authors propose that the model replicates three cell states of the intact epidermis, EMT, and MET. They also show that this response depends on the actin cytoskeleton and Piezo1, and the migration could be stimulated with TGFbeta ligands.

      Strengths:

      Strengths of the study include the establishment of a simple yet robust in vitro model that captures all three cell states, which could be useful for future investigation of wound healing or metastasis. There is also good characterisation of the pore size effects, providing some interesting observations about the physical regulation of keratinocyte migration. The images and presentation are clear.

      Weaknesses:

      (1) Some of the terminology would benefit from better definition or refinement. Triphasic suggests different physical behaviours (e.g. liquid-liquid phase separation) rather than cellular properties. Perhaps it would be better to refer to these as cell states or to describe the model more specifically as an invasion or EMT model. Likewise, the term 'reciprocating' implies two-way communication, but it is used to describe two-way migration or oscillating migration. Here, perhaps oscillatory would be clearer.

      (2) The quantification and statistical analysis of key results could be improved. Notably, quantification of immunostaining in Figures 1 and 2 would strengthen core findings, and greater detail is needed on the sample sizes and number of experiments used for statistical analysis. These details are missing or only appear to N=1 in some places.

      (3) There is an attempt to analyse the underlying molecular mechanisms, but these studies lack depth and detail. For example, it is not clear how actin, keratins, and piezo1 communicate to regulate cell migration. Are they acting directly on EMT genes such as SNAI1 or through changes in cell mechanics and cell-cell adhesions? Likewise, is TGF-beta signalling active in the system (e.g. nuclear pSMAD during cell migration)? As a result, the new biological insight is somewhat limited and confirms much of what is known about these pathways in keratinocyte migration.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Nohara et al. presents a novel 3D assay that allows for stratification of epithelia, active EMT through small pores, and active MET. They show that 3um pores allow for keratinocytes to sample the pore through filopodia and up-regulate EMT genes to transverse the pores to the other side of the membrane where EMT genes are downregulated as the cells re-establish stratified epithelia. The TGFbeta pathway and actin polymerization promote the movement of cells into the pores and Piezo1 and KRT6 actively block this movement. This work provides a novel 3D assay that is likely to become a benchmark to analyze these processes using a more complex system than other current culture-specific EMT and MET assays.

      Strengths:

      The strengths of the manuscript include the foundational analysis of the pathways involved in establishing the tri-phasic epithelium. The authors have incorporated live imaging, drug studies, KO analysis, and RNA sequencing to show the relevant pathways involved.

      Weaknesses:

      While the authors provide strong evidence that the tri-phasic epithelium represents the EMT process, the MET process is largely relegated to the absence of EMT genes. It would be interesting to know how the stratified MET epithelia submerged in the media is similar or different from the stratified epithelia at the air-liquid interface.

    4. Reviewer #3 (Public review):

      Summary:

      The authors established an experimental system that reproduced three-dimensional triphasic epithelia, i.e., the original epithelium, its EMT, and MET. Keratinocytes (KCs), skin epithelial cells, placed on a microporous membrane migrated through 3.0-um or larger micropores. The 3.0-um-pored membrane induced an epithelial structure with three states: stratified KCs above the membrane, KCs showing EMT within the micropores, and a new stratified epithelium under the membrane. The membrane with larger micropores failed to maintain this triphasic epithelium. Live imaging revealed that KCs moved in a reciprocating manner, with actin-rich filopodia-like KC structures extending into and out of the 3.0-um micropores, while the cells migrated unidirectionally into larger micropores. KO of Piezo1 and keratin 6 increased KC entry to and exit from the 3.0-um micropores. Their results demonstrate that benign keratinocytes migrate through confined spaces in a reciprocating manner, which might help form triphasic epithelia, recapitulating wound healing processes.

      Strengths:

      Careful observation of the behaviour of keratinocytes on the different-sized pores. CrispR-Cas9 gene editing to KO Piezo 1 and keratin 6 isoforms in HaCaT keratinocytes.

      Weaknesses:

      There is no analysis of the matrix produced by the keratinocytes on the different pore sizes as this may influence migration.

      HaCaT cells are quite different from normal keratinocytes in terms of migration. Pilcher et al. PMID: 9182674

    1. eLife Assessment

      This work represents an important contribution to our understanding of how membrane energetics influence protein conformation and function in mechano-sensitive channels. Through extensive molecular dynamics simulations and energetic analysis, the study demonstrates how the channel structure is shaped by a balance of protein and membrane-induced forces, effectively reconciling experimental data from different membrane environments. However, while much of the computational data is convincing, some aspects of the energetic analysis and models employed remain incomplete.

    2. Reviewer #1 (Public review):

      Dixit, Noe, and Weikl apply coarse-grained and all-atom molecular dynamics to determine the response of the mechanosensitive proteins Piezo 1 and Piezo 2 proteins to tension. Cryo-EM structures in micelles show a high curvature of the protein whereas structures in lipid bilayers show lower curvature. Is the zero-stress state of the protein closer to the micelle structure or the bilayer structure? Moreover, while the tension sensitivity of channel function can be inferred from the experiment, molecular details are not clearly available. How much does the protein's height and effective area change in response to tension? With these in hand, a quantitative model of its function follows that can be related to the properties of the membrane and the effect of external forces.

      Simulations indicate that in a bilayer the protein relaxes from the highly curved cryo-EM dome (Figure 1).

      Under applied tension, the dome flattens (Figure 2) including the underlying lipid bilayer. The shape of the system is a combination of the membrane mechanical and protein conformational energies (Equation 1). The membrane's mechanical energy is well-characterized. It requires only the curvature and bending modulus as inputs. They determine membrane curvature and the local area metric (Equation 4) by averaging the height on a grid and computing second derivatives (Eqsuations 7, 8) consistent with known differential geometric formulas.

      The bending energy can be limited to the nano dome but this implies that the noise in the membrane energy is significant. Where there is noise outside the dome there is noise inside the dome. At the least, they could characterize the noisy energy due to inadequate averaging of membrane shape.

      My concern for this paper is that they are significantly overestimating the membrane deformation energy based on their numerical scheme, which in turn leads to a much stiffer model of the protein itself. Two things would address this:

      (1) Report the membrane energy under different graining schemes (e.g., report schemes up to double the discretization grain).

      (2) For a Gaussian bump with sigma=6 nm I obtained a bending energy of 0.6 kappa, so certainly in the ballpark with what they are reporting but significantly lower (compared to 2 kappa, Figure 5 lower left). It would be simpler to use the Gaussian approximation to their curves in Figure 3 - and I would argue more accurate, especially since they have not reported the variation of the membrane energy with respect to the discretization size and so I cannot judge the dependence of the energy on discretization. I view reporting the variation of the membrane energy with respect to discretization as being essential for the analysis if their goal is to provide a quantitative estimate for the force of Piezo. The Helfrich energy computed from an analytical model with a membrane shape closely resembling the simulated shapes would be very helpful. According to my intuition, finite-difference estimates of curvatures will tend to be overestimates of the true membrane deformation energy because white noise tends to lead to high curvature at short-length scales, which is strongly penalized by the bending energy.

      The fitting of the system deformation to the inverse time appears to be incredibly ad hoc ... Nor is it clear that the quantified model will be substantially changed without extrapolation. The authors should either justify the extrapolation more clearly (sorry if I missed it!) or also report the unextrapolated numbers alongside the extrapolated ones.

      In summary, this paper uses molecular dynamics simulations to quantify the force of the Piezo 1 and Piezo 2 proteins on a lipid bilayer using simulations under controlled tension, observing the membrane deformation, and using that data to infer protein mechanics. While much of the physical mechanism was previously known, the study itself is a valuable quantification. I identified one issue in the membrane deformation energy analysis that has large quantitative repercussions for the extracted model.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors suggest that the structure of Piezo2 in a tensionless simulation is flatter compared to the electron microscopy structure. This is an interesting observation and highlights the fact that the membrane environment is important for Piezo2 curvature. Additionally, the authors calculate the excess area of Piezo2 and Piezo1, suggesting that it is significantly smaller compared to the area calculated using the EM structure or simulations with restrained Piezo2. Finally, the authors propose an elastic model for Piezo proteins. Those are very important findings, which would be of interest to the mechanobiology field.

      Whilst I like the suggestion that the membrane environment will change Piezo2 flatness, could this be happening because of the lower resolution of the MARTINI simulations? In other words, would it be possible that MARTINI is not able to model such curvature due to its lower resolution?

      Related to my comment above, the authors say that they only restrained the secondary structure using an elastic network model. Whilst I understand why they did this, Piezo proteins are relatively large. How can the authors know that this type of elastic network model restrains, combined with the fact that MARTINI simulations are perhaps not very accurate in predicting protein conformations, can accurately represent the changes that happen within the Piezo channel during membrane tension?

      Modelling or Piezo1, seems to be based on homology to Piezo2. However, the authors need to further evaluate their model, e.g. how it compares with an Alphafold model.

      To calculate the tension-induced flattening of the Piezo channel, the authors "divide all simulation trajectories into 5 equal intervals and determine the nanodome shape in each interval by averaging over the conformations of all independent simulation runs in this interval.". However, probably the change in the flattening of Piezo channel happens very quickly during the simulations, possibly within the same interval. Is this the case? and if yes does this affect their calculations?

      Finally, the authors use a specific lipid composition, which is asymmetric. Is it possible that the asymmetry of the membrane causes some of the changes in the curvature that they observe? Perhaps more controls, e.g. with a symmetric POPC bilayer are needed to identify whether membrane asymmetry plays a role in the membrane curvature they observe.

    4. Reviewer #3 (Public review):

      Strengths:

      This work focuses on a problem of deep significance: quantifying the structure-tension relationship and underlying mechanism for the mechanosensitive Piezo 1 and 2 channels. This objective presents a few technical challenges for molecular dynamics simulations, due to the relatively large size of each membrane-protein system. Nonetheless, the technical approach chosen is based on the methodology that is, in principle, established and widely accessible. Therefore, another group of practitioners would likely be able to reproduce these findings with reasonable effort.

      Weaknesses:

      The two main results of this paper are (1) that both channels exhibit a flatter structure compared to cryo-EM measurements, and (2) their estimated force vs. displacement relationship. Although the former correlates at least quantitatively with prior experimental work, the latter relies exclusively on simulation results and model parameters.

    1. eLife Assessment

      This important work substantially advances our understanding of the interaction among gut microbiota, lipid metabolism, and the host in type 2 diabetes. However, some evidence is incomplete, particularly in the mouse experiments with FMT. Additional experiments will be required to strengthen the authors' interesting findings.

    2. Reviewer #1 (Public review):

      Summary:

      The authors tried to identify the relationships among the gut microbiota, lipid metabolites, and the host in type 2 diabetes (T2DM) by using macaques that spontaneously develop T2DM, considered one of the best models of the human disease.

      Strengths:

      The authors comprehensively compared the gut microbiota and plasma fatty acids between macaques with spontaneous T2DM and control macaques and verified the results with macaques on a high-fat diet-fed mice model.

      Weaknesses:

      The observed multi-omics of the macaques can be done on humans, which weakens the impact of the conclusion of the manuscript.

      In addition, the age and sex of the control macaque group did not necessarily match those of the T2DM group, leaving the possibility for compromising the analysis.

      Regarding the metabolomic analysis, the authors did not include fecal samples which are important, considering the authors' claim about the importance of gut microbiota in the pathogenesis of T2DM.

      In the mouse experiments, the control group should be given a FMT from control macaques rather than just untreated SPF mice since the fecal microbiota composition is likely very different between macaques and mice. Additionally, the palmitic acid-containing diets fed to mice to induce a diabetes-like condition do not mimic spontaneous T2DM in macaques.

    3. Reviewer #2 (Public review):

      This study analyzes the interaction among the gut microbiota, lipid metabolism, and the host in type 2 diabetes (T2DM) using rhesus macaques. The authors first identified 8 macaques with T2DM from 1698 individuals. Then, they observed in T2DM macaques: dysbiosis by 16S rRNA gene amplicon analysis and shotgun sequencing, imbalanced tryptophan metabolism and fatty acid beta oxidization in the feces by metabolome analysis, increased plasma concentration of palmitic acid by MS analysis, and sn inflammatory gene signature of blood cells by transcriptomic analysis. Finally, they transplanted feces of T2DM macaques into mice and fed them with palmitic acid and showed that those mice became diabetic through increased absorption of palmitic acid in the ileum.

      This study clearly shows the interaction among gut microbiota, lipid metabolism, and the host in T2DM. The experiments were well designed and performed, and the data are convincing. One point I would suggest is that in the experiments of mice with FMT, control mice should be those colonized with feces of healthy macaques, but not with no FMT.

    1. eLife Assessment

      The study presents important findings on inositol-requiring enzyme (IRE1α) inhibition on diet-induced obesity (overnutrition) and insulin resistance where IRE1α inhibition enhances thermogenesis and reduces the metabolically active and M1-like macrophages in adipose tissue. The evidence supporting the conclusions is convincing and the work will be of interest to cell biologists and biochemists working in metabolism, insulin resistance, and inflammation.

    2. Reviewer #1 (Public review):

      First, the authors confirm the up-regulation of the main genes involved in the three branches of the Unfolded Protein Response (UPR) system in diet-induced obese mice in AT, observations that have been extensively reported before. Not surprisingly, IRE1a inhibition with STF led to an amelioration of the obesity and insulin resistance of the animals. Moreover, non-alcoholic fatty liver disease was also improved by the treatment. More novel are their results in terms of thermogenesis and energy expenditure, where IRE1a seems to act via activation of brown AT. Finally, mice treated with STF exhibited significantly fewer metabolically active and M1-like macrophages in the AT compared to those under vehicle conditions. Overall, the authors conclude that targeting IRE1a has therapeutical potential for treating obesity and insulin resistance.

      The study has some strengths, such as the detailed characterization of the effect of STF in different fat depots and a thorough analysis of macrophage populations. However, the lack of novelty in the findings somewhat limits the study´s impact on the field.

    3. Reviewer #3 (Public review):

      Summary:

      The manuscript by Wu D. et al. explores an innovative approach in immunometabolism and obesity by investigating the potential of targeting macrophage Inositol-requiring enzyme 1α (IRE1α) in cases of overnutrition. Their findings suggest that pharmacological inhibition of IRE1α could influence key aspects such as adipose tissue inflammation, insulin resistance, and thermogenesis. Notable discoveries include the identification of High-Fat Diet (HFD)-induced CD9+ Trem2+ macrophages and the reversal of metabolically active macrophages' activity with IRE1α inhibition using STF. These insights could significantly impact future obesity treatments.

      Strengths:

      The study's key strengths lie in its identification of specific macrophage subsets and the demonstration that inhibiting IRE1α can reverse the activity of these macrophages. This provides a potential new avenue for developing obesity treatments and contributes valuable knowledge to the field.

      Weaknesses:

      The research lacks an in-depth exploration of the broader metabolic mechanisms involved in controlling diet-induced obesity (DIO). Addressing this gap would strengthen the understanding of how targeting IRE1α might fit into the larger metabolic landscape.

      Impact and Utility:

      The findings have the potential to advance the field of obesity treatment by offering a novel target for intervention. However, further research is needed to fully elucidate the metabolic pathways involved and to confirm the long-term efficacy and safety of this approach. The methods and data presented are useful, but additional context and exploration are required for broader application and understanding.

      Comments on revisions:

      The author has revised the manuscript and addressed the most relevant comments raised by the reviewers. The paper is now significantly improved, though two minor issues remain.

      (1) Studies were limited to male mice; this should be mentioned in the paper's Title.<br /> (2) Please include the sample size (n=) in all provided tables in the main manuscript and supplementary tables.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The study presents important findings on inositol-requiring enzyme (IRE1α) inhibition on diet-induced obesity (overnutrition) and insulin resistance where IRE1α inhibition enhances thermogenesis and reduces the metabolically active and M1-like macrophages in adipose tissue. The evidence supporting the conclusions is convincing but can be enhanced with information/data on the validity, specificity, selectivity, and toxicity of the IRE1α inhibitor and supported with more detail on the mechanisms by which adipose tissue macrophages influence adipocyte metabolism. The work will be of interest to cell biologists and biochemists working in metabolism, insulin resistance, and inflammation.

      We thank the editors for the assessment and appreciation of our findings in this study. In the revision, we have added the information on the validity, selectivity and toxicity of IRE1α inhibitor. In addition, we also discussed the likelihood that suppression of metabolically activated proinflammatory macrophage population in adipose tissue on the reversal of adipose remodeling and thermogenesis. In the revision, we have improved the manuscript significantly throughout the text and figures following the recommends by the reviewers.

      Public Reviews:

      Reviewer #1 (Public review):

      First, the authors confirm the up-regulation of the main genes involved in the three branches of the Unfolded Protein Response (UPR) system in diet-induced obese mice in AT, observations that have been extensively reported before. Not surprisingly, IRE1a inhibition with STF led to an amelioration of the obesity and insulin resistance of the animals. Moreover, non-alcoholic fatty liver disease was also improved by the treatment. More novel are their results in terms of thermogenesis and energy expenditure, where IRE1a seems to act via activation of brown AT. Finally, mice treated with STF exhibited significantly fewer metabolically active and M1-like macrophages in the AT compared to those under vehicle conditions. Overall, the authors conclude that targeting IRE1a has therapeutical potential for treating obesity and insulin resistance.

      The study has some strengths, such as the detailed characterization of the effect of STF in different fat depots and a thorough analysis of macrophage populations. However, the lack of novelty in the findings somewhat limits the study´s impact on the field.

      We thank the reviewer for the appreciation of our findings and the comments about the novelty. Regarding the novelty, we would emphasize several novelties presented in this manuscript. First, as the reviewer correctly pointed out, we discovered that IRE1 inhibition by STF activates brown AT and promotes thermogenesis and that IRE1 inhibition not only significantly attenuated the newly discovered CD9+ ATMs and the “M1-like” CD11c+ ATMs but also diminished the M2 ATMs for the first time. These discoveries are very important and novel. In obesity, it was originally proposed that ATM undergoes M1/M2 polarization from an anti-inflammatory M2 to a classical pro-inflammatory M1 state. It was further reported that IRE1 deletion improves thermogenesis by boosting M2 population which then synthesize and secrete catecholamines to promote thermogenesis. It is now known that M2 macrophages do not synthesize catecholamines or promote thermogenesis. In this study, we discovered that IRE1 inhibition doesn’t increase (but instead decrease) the M2 population and that IRE1 inhibition promotes thermogenesis likely by suppressing pro-inflammatory macrophage populations including the M1-like ATMs and most importantly the newly identified metabolically active macrophages, given that ATM inflammation has been reported to suppress thermogenesis. Second, this study presented the first characterization of relationship between the more classical M1-like ATMs and the newly discovered metabolically active ATMs, showing that the CD11c+ M1-like ATMs are largely overlapping with but yet non-identical to CD9+ ATMs in the eWAT under HFD. Third, although upregulation of ER stress response genes in the adipose tissues of diet-induced obese mice have been extensively reported, it doesn’t necessarily mean that targeting IRE1a or ER stress can reverse existing insulin resistance and obesity. It is not uncommon that a therapy doesn’t yield the desired effect as expected. For instance, amyloid plaques are a hallmark of Alzheimer's disease (AD), interventions that prevent or reverse beta amyloid deposition have been expected to prevent progression or even reverse cognitive impairment in AD patients. However, clinical trials on such therapies have been disappointing. In essence, experimental demonstration of effectiveness or feasibility for any potential therapeutic targets is a first step for any future clinical implementation.

      Reviewer #2 (Public review):

      The manuscript by Wu et al demonstrated that IRE1a inhibition mitigated insulin resistance and other comorbidities through increased energy expenditure in DIO mice. In this reviewer's opinion, this timely study has high significance in the field of metabolism research for the following reasons.

      (1) The authors' findings are significant and may offer a new therapeutic target to treat metabolic diseases, including diabetes, obesity, NAFLD, etc.

      (2) The authors carefully profiled the ATMs and examined the changes in gene expression after STF treatment.

      (3) The authors presented evidence collected from both systemic indirect calorimetry and individual tissue gene expression to support the notion of increased energy expenditure.

      Overall, the authors have presented sufficient background in a clear and logically organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings, and made a justified conclusion.

      We thank the reviewer for the appreciation of our work.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Wu D. et al. explores an innovative approach to immunometabolism and obesity by investigating the potential of targeting macrophage Inositol-requiring enzyme 1α (IRE1α) in cases of overnutrition. Their findings suggest that pharmacological inhibition of IRE1α could influence key aspects such as adipose tissue inflammation, insulin resistance, and thermogenesis. Notable discoveries include the identification of High-Fat Diet (HFD)-induced CD9+ Trem2+ macrophages and the reversal of metabolically active macrophages' activity with IRE1α inhibition using STF. These insights could significantly impact future obesity treatments.

      Strengths:

      The study's key strengths lie in its identification of specific macrophage subsets and the demonstration that inhibiting IRE1α can reverse the activity of these macrophages. This provides a potential new avenue for developing obesity treatments and contributes valuable knowledge to the field.

      Weaknesses:

      The research lacks an in-depth exploration of the broader metabolic mechanisms involved in controlling diet-induced obesity (DIO). Addressing this gap would strengthen the understanding of how targeting IRE1α might fit into the larger metabolic landscape.

      Impact and Utility:

      The findings have the potential to advance the field of obesity treatment by offering a novel target for intervention. However, further research is needed to fully elucidate the metabolic pathways involved and to confirm the long-term efficacy and safety of this approach. The methods and data presented are useful, but additional context and exploration are required for broader application and understanding.

      We thank the reviewer for the appreciation of strengths in our manuscript. In particular, we appreciate the reviewer’s recommendation on the exploration of broader metabolic landscape, such as the effect of IRE1 inhibition on non-adipose tissue macrophages and metabolism. We agree that achieving these will certainly broaden the therapeutic potential of IRE1 inhibition to larger metabolic disorders and we will pursue these explorations in future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      A list of recommendations for the authors is presented below:

      (1) Please, update the literature review to include more recent studies relevant to the topic.

      We thank the reviewer’s suggestions. We have added more references from recent studies.

      (2) Please, provide a detailed explanation of how STF functions, including potential off-target effects or issues related to specificity.

      We thank the reviewer’s suggestions. STF is a small-molecule inhibitor designed to selectively inhibit the RNase activity of IRE1a. Once IRE1a is activated (e.g., in obesity), its RNase domain initiates the unconventional splicing of the transcription factor X-box binding protein 1 (XBP1) mRNA and the Regulated IRE1-Dependent Decay (RIDD) of microRNAs, which is detrimental if prolonged. IRE1a RNase inhibitors including STF engage the RNase-active site of IRE1a with high affinity and specificity by exploiting a shallow complementary pocket through pi-stacking interactions with His910 and Phe889 and an essential Schiff base interaction between the aldehyde moiety of the inhibitor and the side chain amino group of Lys907 (Sanches et al., NComm 2014, PMID: 25164867). This specific and high affinity binding blocks the IRE1a RNase activity, preventing the splicing of XBP1 mRNA and RIDD. As IRE1a has been shown to be activated in multiple tissues under various pathological conditions and to be responsible for the progression of the pathological conditions, inhibition of IRE1a by pharmacological agents including STF has the great potential for the treatment of various pathological disorders. Several studies have reported that STF shows no overt toxicity when administered systemically (Madhavan, Aparajita, et al.2022, PMID 35105890; Herlea-Pana et al., 2021, PMID 34675883; Papandreou et al., 2011, PMID 21081713; Tufanli et al., 2017, PMID 28137856).

      (3) Lines 263-266 require a reference.

      We thank the reviewer’s suggestion. A reference has been added.

      (4) Stromal vascular fraction (SVF) also contains a significant amount of preadipocytes and stem cells, not only macrophages, which might affect the conclusions reached by the authors.

      We thank the reviewer’s comments. It is true that SVF consists of multiple cell types, including endothelial cells, macrophages, preadipocytes, and various stem cell populations. In HFD-induced obesity, adipose tissue undergoes significant remodeling, and the percentage of macrophages in the SVF of obese adipose tissue increases significantly relative to other cell types. In our studies, SVFs from adipose tissues of obese mice were isolated, cultured, and treated with STF for overnight.  We observed that IRE1 RNase activity in SVFs was inhibited by STF treatment, and that ATM population and the expression of pro-inflammatory genes were downregulated by STF. Given the short-term treatment, the parsimonious interpretation of the data would be that STF directly acts on ATMs.  However, we note that the possibility that the effect of STF on other cell types might influence the ATM and inflammatory gene expression can’t be totally ruled out. As such, we have modified our conclusion from “these results indicate that STF acts directly on ATMs to regulate inflammation” to “these results indicate that STF likely acts directly on ATMs to regulate inflammation”.

      (5) Figures 1A and G: It is common practice to present the XBP1s/XBP1u ratio; consider using this standard measure.

      We thank the reviewer’s comments. Regarding the XBP1 mRNA splicing, we see both ways of presentation in publications. There are quite a number of papers, for instance, PMID25018104, 2014, Cell; PMID23086298, 2012, NCB, that used the XBP1s/ (XBP1s+XBP1u) ratio. We preferred this way of presentation as it shows the ratio of spliced XBP1 (XBP1s) relative to the total XBP1 mRNA (XBP1s+XBP1u).

      (6) Figure 1F: please indicate the type of AKT phosphorylation assessed.

      We thank the reviewer’s comments. We have added Ser473 as the phosphorylation site at in both figure legend and figure.

      (7) Figures 2E-H: please clearly indicate the specific fat depots analyzed in each figure.

      We thank the reviewer’s comments. We have added the information in the figure legends and figures.

      (8) Figures 1I and 3A, and Supplementary Figures 6D-E: please include a quantification analysis of the images presented.

      We thank the reviewer’s suggestion. We have added the quantifications of the images.

      (9) In Figure 3D the image corresponding to the merge for the STF condition is a duplication of the control, please correct this.

      We thank the reviewer for pointing this out. We have replaced it with the correct image.

      (10) Figures 4B-F: please provide individual data points in the graphs to show variability and sample distribution.

      We thank the reviewer’s suggestion. We have re-plotted the graphs in Fig. 4B-F with the individual data points.

      (11) Figure 4I: it is rather unusual to have such a strong signal of UCP1 in ND conditions, please explain.

      We thank the reviewer for the comment. We wish to point out that the images were taken from BAT slides. UCP1 is expected to show strong staining in BAT under DN condition, which as expected is weakened under HFD condition. STF treatment was able to correct the HFD-induced weakening of UCP1 staining in BAT.

      (12) Supplementary Figures 2C-D: please provide representative images for better clarity and interpretation.

      We thank the reviewer for the comment. The representative images for Supplementary Figures 2C-D were actually shown in Figures 2C and F. Supplementary Figures 2C-D were the mere quantification for adipocyte areas for Figures 2C and F.

      (13) Supplementary Table 3 is repeated, please remove.

      We thank the reviewer for the comment. We have deleted this repetition.

      Reviewer #2 (Recommendations for the authors):

      The manuscript can be further strengthened with more clarification on the following points.

      (1) The use of IRE1a pharmacological inhibitor STF-083010 (STF) needs to be validated. How was the dose determined? Were there any dose-dependent studies? Under the current dosing regimen, what are the specificity, selectivity, and toxicity of STF? Also, were the serine/threonine kinase and RNase activities measured in the adipocytes and ATMs of the animals dosed with the compound? What's the PK data?

      We thank the reviewer for the comments. In the animal study, we used STF 10 mg/kg for intraperitoneal injection. This dose was adopted from several recent studies (Madhavan, Aparajita, et al.2022, PMID 35105890; Herlea-Pana et al., 2021, PMID 34675883; Papandreou et al., 2011, PMID 21081713; Tufanli et al., 2017, PMID 28137856), in which STF treatment showed beneficial effect in their respective disease models. STF didn’t compromise cell viability or induce any other toxicity at the dose or concentration used in these studies (Papandreou I, et al., 2011; Upton JP, et al., 2012; Lerner AG, et al., 2012; Kemp KL, et al., 2013; Cross BC, et al., 2012). In our study, we didn’t observe any apparent toxicity on mice at this dose. Importantly, we did observe that STF inhibited IRE1 RNase activity in adipose tissues (F1G, S1D) and ATMs (F6Q, S8C, G, I) of the animals at this dose. As the IRE1 inhibitors including STF has been extensively examined and shown to have no effect on the kinase function of IRE1 (Cross et al., 2012, PMID: 22315414; Tufanli et al., 2017, PMID 28137856), we didn’t perform the assay on Ire1 kinase activity. Additionally, as the chemical has been administered into several animal models, with significant beneficial effects, one would assume decent pharmacokinetic parameters being achieved with the current dose. It would be important and necessary to have systematic PK studies in the future if clinical trials are to be considered.

      (2) The statistical method for individual panels in each figure needs to be specified.

      We thank the reviewer for the suggestion. We have specified the statistical method in the figure legends.

      (3) In Figure 1E, there's no difference in fasting insulin levels, though a difference was detected after the glucose load. This suggests an effect on insulin secretion but not insulin sensitivity.

      We thank the reviewer for the comments. The insulin levels are still different between Veh and STF groups at fasting, just not reaching statistically significant. Under glucose stimulation, the insulin levels all showed the same trend, which is, the STF group is lower than the Veh group. Even if the fasting insulin levels showed no difference between the two groups regardless of glucose stimulation, the fact that the blood glucose levels at all the time points are lower in STF group than Veh group (Fig. 1C) indicates that insulin sensitivity is improved. In our study, the insulin levels were lower in STF group, but the blood glucose levels were still lowered by STF, further strengthening the notion that STF treatment improves insulin sensitivity. This is indeed further corroborated by the ITT results (Fig. 1D).

      (4) Figure 2 and S2A did not show a decrease in BW but rather BW gain. The statement (line 308) needs to be edited. As a result of this, the relative fat mass measurement (% of BW) needs to be presented in addition to Figure 2B.

      We thank the reviewer for the comments/suggestions. As shown in Figs. 2A and S2A, we observed a slight decrease in body weight (~2g reduction) in STF-treated mice while Veh group increased body weight by ~3.5g, at the end of 4 weeks of treatment. As shown in Fig. 2B, this difference in body weight between Veh and STF groups was primarily due to a reduction in fat tissue. In the revision, we also added the percentages of fat and lean masses over total body weight in Supplemental Fig. 2B, which show the similar trend.

      (5) The measurement of blood lipid levels in Figure 3F-H is informative. More importantly, hepatic lipid content needs to be measured.

      We thank the reviewer for and agree on the comments. As this study is more focused on the insulin resistance and adipose tissue remodeling, we didn’t go deep into the comorbidities beyond the reported observations. It will be interesting to explore the effects of IRE1 inhibition on the obesity/insulin resistance comorbidities including hepatic lipid content measurement in future study.

      Minor corrections:

      (1) Line 261: "(spliced".

      Done. We have corrected it.

      (2) Line 334: spell out "PEPCK".

      We have added the full name “Phosphoenolpyruvate carboxykinase”. Thanks!

      (3) Line 478: please rephrase.

      We thank the reviewer for the comment. We have rephrased the sentence as following: “These results reveal that STF treatment suppresses the adipose tissue inflammation and the accumulation of pro-inflammatory ATM with augmenting (suppressing instead) M2-like ATMs.”

      (4) Figure 4L: "pGC1-a".

      We thank the reviewer for pointing this out. We have corrected the name.

      (5) Figure 4O: missing Y-axis label.

      We have added the label. Thanks!

      Reviewer #3 (Recommendations for the authors):

      The observations presented by Wu D. et al. in the manuscript are potentially interesting and relevant. The current study seeks to build upon previous findings, specifically from the work titled, "Silencing IRE1α using myeloid-specific cre suppresses alternative activation of macrophages and impairs energy expenditure in obesity." By using a pharmacological inhibitor to modulate IRE1α activity in adipose tissue macrophages (ATMs), the authors aim to develop therapeutics that could significantly impact the treatment of obesity and metabolic disease.

      The authors have performed some satisfactory experiments related to liver steatosis. However, the manuscript would benefit from a more comprehensive exploration of the mechanisms by which ATMs influence adipocyte metabolism, particularly in epididymal white adipose tissue (eWAT). In particular, the study should investigate how adiposity and lipid droplet size change in response to alterations in lipolysis and adipogenesis, as this could provide insights into how these processes contribute to the amelioration of the obesity phenotype.

      Several issues should be addressed to strengthen the manuscript and make the study more convincing. Below are specific comments and recommendations:

      Major:

      (1) The indirect calorimetric data should be normalized for dependent variables such as body weight, lean mass, and fat mass+ lean mass to accurately interpret the results. The results for 24-hour energy expenditure should be included in Figure 4B-F to provide a more comprehensive analysis. It is recommended to plot bar graphs with all individual data points for the energy expenditure (EE) results shown in Figure 4B-F, to offer a clearer and more detailed presentation of the data (Figure 4B-F).

      We thank the reviewer for the comments. Data analysis on the indirect calorimetric studies has evolved over the years. One common practice was/is to normalize the data by body weight. However, this approach was deemed improper some years ago (Tschop et al Nature Methods 2012, PMID: 22205519). Tschop paper also pointed out the shortcomings associated with normalization by lean mass. Instead, it concludes that “generalized linear model is the most appropriate statistical approach to accommodate discrete (genotype) and continuous (body mass) traits, rather than using a simple division by BW or lean BW”. In our study, we used CalR, an improved generalized linear model (which includes ANOVA and ANCOVA) (Mina et al Cell Metabolism 2018, PMID: 30017358) for all our energy expenditure data analysis (shown in Fig. 4A-E). In the revision, we also included data analysis normalized by BW (Fig. S2F-H’), which actually shows even wider difference between Veh and STF groups than the data shown in Fig. 4A-F. As STF decreased the fat mass and had little effect on lean mass, the difference would be more drastic for normalization with fat mass and with fat mass+ lean mass than the data shown in Fig. 4A-E and would be similar to the data shown in Fig. 4A-E for normalization with lean mass. In addition, we replotted the graphs in Fig. 4B, D, F-H with the individual data points.

      (2) At the thermoneutral point (30{degree sign}C), the study could benefit from testing the indirect calorimetric models of human energy physiology. Future studies could also explore this to evaluate the implications for drug development.

      We agree with the reviewer on the comments. In the future study, it will be very informative to investigate the effects of STF under thermoneutral conditions, which could provide more consistent data on how drugs affect metabolic processes in humans, improving translational research.

      (3) The current study missed the opportunity to investigate the effects of STF on non-adipose tissue (non-AT) resident macrophage populations, such as those in bone marrow or lymph-node macrophages. Understanding how STF modulates macrophage metabolism in these contexts would be valuable.

      We thank the reviewer for and agree on the comments. As this study is more focused on the insulin resistance and adipose tissue remodeling, we were mostly restricted to adipose tissue macrophage populations. In the future, it would be interesting to investigate the effect of STF on macrophages in other non-adipose tissues, which will provide a more comprehensive understanding of STF's effects on immune cell metabolism, which could inform its application in various therapeutic areas.

      (4) The study should explore how STF influences the expression of CD9, Trem2, (positive subpopulations), and the secretion of pro-inflammatory cytokines by macrophages, particularly in response to LPS and IFNγ activation in stromal vascular fraction (SVF) cells and bone marrow-derived macrophages (BM-Macrophages).

      We appreciate the reviewer for the comments. Under obesity, the ATM does not undergo the classical M1/M2 polarization; instead, both M1-like/pro-inflammatory macrophages and M2 macrophages increase drastically in obesity. It will be interesting to investigate the effects of STF on the newly identified CD9- and Trem2-positive macrophage subpopulations in SVF and bone marrow macrophages in response to LPS and IFNγ stimulation in the future, although these studies might not faithfully reflect the changes in adipose tissue under obesity as these stressors typically induce classical M1/M2 polarization.

      (5) Additional macrophage gating is necessary better to understand adipose tissue macrophage (ATM) inflammation. Specifically, CD11c−MHC2 low macrophages represent a newly identified inflammatory and dynamic subset in murine adipose tissue. These ATMs accumulate rapidly after ten days of a high-fat diet (HFD) and should increase further with prolonged HFD. For this study, CD11c−MHC2 low ATMs could be subdivided for flow cytometry analysis based on their MHC2 expression, distinguishing them from CD11c−MHC2 high ATMs. All macrophage subtypes categorized here can be studied for metabolic health using seahorse analysis as well.

      We appreciate the reviewer for the comments. It will be interesting to investigate the effects of STF on the newly identified CD11c−MHC2 low macrophage subpopulation in the future. Future studies certainly can include metabolic analysis with Seahorse which can corroborate the energy metabolism at the cellular level with organismal thermogenesis. 

      (6) All flow cytometry histograms - are they showing mean fluorescence intensity or cell# per population? Please specify. All flow cytometry dot plots - It would be helpful for readers to see populations plotted as bar graphs next to respective flow plots, as opposed to being shown as supplemental tables. Additionally, labeling dot plots with the parent population from which cells were gated on would also help readers understand faster what we're looking at.

      We appreciate the reviewer for the comments. In flow cytometry histograms, we used “normalized to mode”. The mode is often used to compare the distribution of fluorescence intensity between different samples. It focuses on the shape of the distribution (with a max of 100%) rather than the absolute cell counts, which helps remove variations caused by different cell numbers or sample sizes, making it easier to compare populations based on fluorescence intensity. When normalizing to the mode, the highest peak in the histogram is scaled to 100%, and all other values are scaled relative to that peak. This allows for easy comparison of multiple histograms, even if the total number of cells (or events) differs between samples.

      (7) The results appear to confuse the actual sample size and p-value. Please carefully review the statistical analyses to ensure that biological replicates are accurately represented. Additionally, include p-values alongside fold change data in the text for clarity represented.

      We appreciate the reviewer for the comments. We have rechecked the statistical analyses confirming that the biological replicates are now properly represented. The exact number of biological replicates for each experiment is now clearly specified in both the methods section and figure legends.

      (8) To further validate the findings, consider using Seahorse analysis at the cellular level in future experiments. This could confirm indirect calorimetric data and thermogenesis responses to cold stimulation.

      We appreciate the reviewer for the comments. Yes, Seahorse analysis at the cellular level will be conducted in future experiments.

      (9) Please ensure the use of person-first language, avoiding labels or adjectives that define individuals based on a condition or characteristic.

      We appreciate the reviewer for the comments. We have changed the descriptions by using person-first language.

      (10) The manuscript does not demonstrate how STF inhibition of IRE1α in ATM, specifically through CD9 and Trem2, controls diet-induced obesity. This aspect should be further elucidated.

      We appreciate the reviewer for the comment. In this study, we observed that STF inhibits IRE1α RNase activity in SVF and in sorted ATMs as well as in adipose tissue. The improvement in diet-induced obesity can be attributable to IRE1α inhibition in both adipocytes and macrophages as shown previously by myeloid and adipocyte-specific knockouts of IRE1α. To conclude whether the IRE1α in CD9- and/or Trem2-positive ATMs controls diet-induced obesity, genetic means would be needed to generate CD9- and/or Trem2-positive ATMs-specific deletion of IRE1α, which will be technically challenging at this moment as there is no CD9 or Trem2-specific Cre lines available.

      Minor:

      (1) Line 43-44: Update terminology to "MASLD" instead of "NAFLD."

      We thank the reviewer for pointing these out. We have changed the terminology in the revision.

      (2) Line 58-59: Add a reference for the mentioned text.

      We thank the reviewer for the comment. Added a reference in the text in the revision.

      (3) Was the antibody used to detect CD9 and Trem2 validated for FACS and other analyses?

      We thank the reviewer for the comment. In our studies, we determined CD9 and Trem2 expression through flow cytometry and immunostaining staining. In flow experiment, CD9 and Trem2 were acquired from Biolegend: PE/Dazzle™ 594 anti-mouse CD9 (BioLegend Cat# 124821, RRID:AB_2800601); APC-conjugated Trem2 (R&D Systems Cat# FAB17291N, RRID:AB_3646995), which were validated for FACS. For immunostaining: CD9  (Abcam Cat# ab223052, RRID:AB_2922392). and Trem2 (R&D Systems Cat# MAB17291, RRID:AB_2208679).

      (4) Studies were limited to male mice; this should be noted in the title and discussed as a limitation.

      We thank the reviewer for the comment. We have modified the wording in the revision.

      (5) Ensure all reagents are fully described with preparation details and identifiable numbers for reproducibility and/or submit the FACS protocol to any protocol archives.

      We thank the reviewer for the suggestions. Yes, we have modified the wording in the revision.

      (6) Provide the correct version numbers for all software used (FlowJo, Prism, etc.).

      We thank the reviewer for the suggestions. We have provided the correct version numbers for softwares for FlowJo and Prism.

      (7) Specify section size (µm) and blocking agent used for eWAT immunofluorescence (Line 207).

      We thank the reviewer for the suggestions. We have added this information.

      (8) Add gene accession numbers to Supplementary Table 3.

      We thank the reviewer for the suggestions. We have added this information.

      (9) Figure 2: Clarify HFD and treatment timelines with a schematic diagram.

      We thank the reviewer for the suggestions. We have added a schematic diagram in Supplemental Figure 1C.

      (10) For histology analysis, the minimum combined data from triplicate images is shown in Figure 2C-2H. For Figures 2E and H, provide complete methods for histology analysis.

      We thank the reviewer for the comments. For the histology analysis shown in Figures 2C–2H, we used a minimum of three mice per treatment group. For each mouse, 3–5 images were taken for analysis. All histology analyses were conducted using ImageJ for image quantification, and the data were processed and organized using Excel and Graphpad.

      (11) Figure 3D Macrophage markers F4/80 stained differently in Figure 5B; to avoid false positive staining, show isotype control to confirm actual staining. For eWAT immunofluorescence (Figures 3D, 5B, 6E)., counterstaining is needed in addition to macrophages, such as for adipocytes-perilipin, and phalloidin for total cells.

      We thank the reviewer for the comments. Yes, Figures 3D macrophage marker F4/80 stained is differently from that of Figure 5B, as they are in different tissues, with Figure 3D in liver samples while Figure 5B in adipose tissues. In the liver, subsets of macrophages are known as Kupffer cells. Kupffer cells have distinct morphology and behavior compared to other tissue-resident macrophages. When stained with F4/80 in the liver, the pattern may reflect the specialized role of Kupffer cells, typically showing a more diffuse or localized staining around blood vessels and sinusoids. In adipose tissue, macrophages tend to accumulate around dead or dying adipocytes, forming what is known as "crown-like structures" (CLS). The F4/80 staining in adipose tissue shows a more clustered pattern, particularly around areas of fat tissue undergoing remodeling or inflammation. In adipose tissue, you can still see clear, defined cells even without counterstaining like perilipin, and importantly, adipocytes are generally way larger than macrophages in size. Yes, we agree that if with counterstaining it would enhance the accuracy. In the future study, we will use perilipin staining to make it easier to differentiate adipocytes from other structures and provide stronger data.

      (12) Insert scale bars in the original images for Figures 3D, 4I, 4M, 5B, 6E, S3B, S6D-E, and S7A-B. All images added a scale bar not inserted while acquiring the image or using imaging software.

      We thank the reviewer for the suggestions. The resolution for the scale bars in the images obtained during acquisition, somehow, isn’t sufficient enough to be clearly visible and requires the enlargement of the images to be seen clearly. In the revision, we have manually added the scale bars for clarity.

      (13) Figure 5E: Please label X-axis as F4/80.

      We thank the reviewer for pointing this out. The label has been added in the revision.

      (14) Figure 5F: It is specified in the legend that cells were gated on F4/80+CD11b+CD11c+, but there is a CD11c- population shown in the histogram...How is this population appearing if all cells should be CD11c+?

      We thank the reviewer for pointing this out. We gated against CD11c in F4/80+CD11b+ population. As such, we have corrected the description in the legend.

      (15) Figure 5G: What is the F4/80+CD11b+CD11c-CD206- population gated in quadrants?

      We thank the reviewer for the comment. The F4/80+CD11b+CD11c-CD206- population was shown in Figure 5G on the lower left side, with the percentages being 15.7% for ND, 5.54% for Veh-HFD, and 26% for STF-HFD.

      (16) Figure 6J: Flow cytometry gates seem slightly misplaced and the sample appears to be overcompensated - were FMOs included in this experiment to establish proper gates? If so, please include.

      We thank the reviewer for the comment. In the study, we did include Fluorescence Minus One (FMO) control in the experiment to establish proper gating. We have included this information in the methods section.

      (17) Table 1-3: Indicate the number of replicates (n=) used in all tables.

      We thank the reviewer for the suggestion. We have provided the specific number of mice used in the study within the figure legends.

    1. eLife Assessment

      This study describes the impact of mycobacterial genetic diversity on host-infection phenotypes by assessing the effect of different M. tuberculosis lineages on granulomatous inflammation using a 3D in vitro granuloma model. Despite being descriptive and showing mostly correlative relationships, the useful findings and data provide some solid support regarding the functional impact of M. tuberculosis natural diversity on host-pathogen interactions. The study will interest researchers working on mycobacteria and how genetic diversity influences virulence and immunity outcomes.

    2. Reviewer #2 (Public review):

      Summary:

      This manuscript reports a comparison of microbial traits and host response traits in a laboratory model of infected granuloma using Mtb strains from different lineages. The authors report increased bacillary growth and granuloma formation, inversely associated with T cell activation that is characterized by CXCL9, granzyme B and TNF expression. They therefore infer that these T cell responses are likely to be host-protective and that the greater virulence of modern Mtb lineages may be driven by their ability to avoid triggering these responses.

      Strengths:

      The comparison of multiple Mtb lineages in a granuloma model that enables evaluation of the potential role of multiple host cells in Mtb control, offers a valuable experimental approach to study the biological mechanisms that underpin differential virulence of Mtb lineages that has been previously reported in clinical and epidemiological studies.

      Weaknesses:

      The study is rather limited to descriptive observations, and lacks experiments to test causal relationships between host and pathogen traits. Some of the presentation of the data are difficult to interpret, and some conclusions are not adequately supported by the data.

      Comments on revisions:

      The authors have addressed my previous comments with appropriate revisions and explanations.

    3. Reviewer #3 (Public review):

      Arbués and colleagues describe the impact of mycobacterial genetic diversity on host-infection phenotypes. The authors evaluate Mtb infection and contextualize host-responses, bacterial growth and metabolic transitioning in vitro using their previously established model of blood-derived, primary-human-cells cultured within a collagen/fibronectin matrix. They seek to demonstrate the effectiveness of the model in determining mycobacterial strain specific granuloma-dependent host-pathogen interactions.

      Understanding the way mycobacterial genetic diversity impacts granuloma biology in tuberculosis is an important goal. One of this works strengths is the use of primary human cells and two constituents of pulmonary extracellular matrix to model Mtb infection. The authors and others have previously shown that Mtb infected PBMC aggregates share important characteristics with early pulmonary TB granulomas. Use of multiple genetically distinct strains of Mtb defines this work and further bolsters it potential impact. However, the study is not comprehensive as lineages 6 and 7 are not tested. Experiments are primarily descriptive, and the methodologies are conventional. Correlative relationships are the manuscripts focus and effect sizes are generally small.

      The main aim of this work is to extend an in vitro granuloma model to the study of a large collection of well characterized, genetically diverse representatives of the mycobacterium tuberculosis complex (MTBC). I believe that they accomplish that aim. The work does investigate MTBC infection of aggregated PBMCs using three strains each of Mtb lineages 1-5 and H37Rv, which is not a trivial undertaking. The experimental aims are to show that MTBC genetic diversity impacts growth and dormancy of granuloma bound bacteria and, the host responses of granulomatous aggregation as well as macrophage apoptosis, lymphocyte activation and soluble mediator release within granulomas. The methodologies employed are sufficient to test most of these aims. The authors conclusions regarding their results are mostly supported by the data. The conclusion that lineage impacts growth within granulomas is likely true and the data as presented reflect such a relationship. Their conclusions regarding lineage's impact on dormancy are partially supported, as their findings demonstrate that assays for dormancy identify strain-specific metabolic changes in the bacteria consistent with a dormancy-like state but also identify replicating bacteria as being dormant. The data strongly supports the impact of mycobacterial genetic diversity on a spectrum of granulomatous responses in their model system. Those findings are a highlight of the publication. The data further supports the idea that strain diversity impacts macrophage apoptosis but a relationship of apoptosis to the granulomatous response is not effectively evaluated. The association of lymphocyte activation with reduced mycobacterial growth as an aspect of granulomas is well documented in the literature and a negative correlation between T cell activation and growth is supported by the authors results. Their data also support the conclusion that soluble mediator production by PBMCs is different based on the infecting strain of mycobacteria and that IL1b modulates aggregate phenotypes in their model.

      The authors contribute some valuable insights, particularly in figure 3. Their model is higher echelon relative to others in the field, but I don't believe that it possesses all the components necessary to replicate formation of mycobacterial granulomas in vivo. That being said, their identification of donor-dependent aggregation phenotypes by mycobacterial strain has the potential to enable future investigations of human and mycobacterial genetic components that are involved in the formation of TB granulomas.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      The analysis of the dormancy rates is interesting and offers some intriguing questions related to the higher dormancy rate found for the L2 isolates and lower for the L3 ones. It will be interesting in the future to expand the data generated in this advanced in vitro plaAorm to in vivo studies.

      Indeed, an increased dormancy propensity of L2 isolates was previously reported in broth culture and associated to specific genetic polymorphisms. The opposite phenotype observed in the L3 isolates is indeed particularly intriguing and was not described to date. Hence, we fully agree that it would be very interesting to find out whether these phenotypes are also observed in vivo.

      The authors propose that ‘strains exhibiting greater proliferative capacity are more prone to induce macrophage apoptosis, thereby contributing to the extent of the granulomatous response.’ It would be interesting to know what happens if the macrophage apoptotic response is blocked.

      This is an interesting suggestion that would deserve a dedicated comprehensive investigation covering other cell death pathways. Even though the trend is significant, the correlation coefficient is rather low in this interaction, which looks a fortiori due to substantial inter-host variability in the apoptotic propensity of macrophages from individual donors to a given strain. In addition, such blocking experiments may require performing isolated macrophage infections that would fall outside of the scope of this study, or considering the extent and the contribution of the apoptosis of other cell subsets. 

      In contrast to macrophage apoptosis, T cell activation correlated with less replicative bacteria. Are these two findings related, ie, are the granulomas showing more (apoptotic) macrophages the ones with a lower percentage of activated T cells? This would shed light on what distinguishes granulomas that are protective from those that support bacterial growth. 

      Indeed, a significant negative correlation between macrophage apoptosis induction and T cell activation can be observed, specifically with activated CD4 T cells expressing CD38 (rS \= -0.36, p < 0.05) or CD69 (rS = -0.40, p < 0.01). We have added this additional result in the manuscript text (line 217).

      It would also be interesting to know the functional impact of blocking early CXCL9 or IL1b on the outcome of granulomatous response/bacteria growth.

      We have performed the suggested early blocking experiments and added the expected negative effect on granuloma formation upon neutralization of IL-1b (current Fig. 6E) in the revised version of the manuscript, and furthermore discussed the null effect on bacterial growth of the treatment with an anti-CXCL-9 specific antibody (current Fig. 6H).

      The authors acknowledge the absence of neutrophils in this model. However, this could be discussed in more detail, as neutrophils play an important part in TB pathogenesis as shown in different models of infection and human TB. 

      We concur and have expanded the importance of neutrophils in TB pathogenesis (including references) in the discussion section (line 260). 

      Related to neutrophils and TB pathogenesis, another important player is type I IFN. The multiplex assay used included IFN-alpha, was this molecule detected? If so, was there any difference in the levels of type I IFN detected among the different infections?

      We agree and that is why we had originally included IFN-α in our screen. However, this cytokine remained under the limit of quantification at both studied time points, preventing us to draw conclusions on the effect of Mtb strain diversity on the secretion of type I IFNs in in vitro granulomas.

      Reviewer #2:

      In Figure 1b/c, it is not clear what comparisons are being made to give the p-value annotations.

      In Figure 2a/b, it is not clear what comparisons are being made to give the p-value annotations.

      In Figure 3a, again it is not clear what comparisons are being made to give the p-value annotation.

      The p-values formerly present on the upper le] corner of the panels were resulting from either Friedman (Figures 1C, 2A and 3A) or Kruskal-Wallis (Figures 1B and 2B) tests and indicated whether there was a significant difference between the analyzed groups overall. To avoid confusion, those values have been removed to only leave the post-test comparison between specific groups.  

      In the results narrative related to Figure 1 (lines 93-103), the authors refer to lineage heterogeneity without providing any objective quantification of this - I suggest they do so, by providing variance or standard deviations. 

      Thank you very much for this relevant suggestion, we have now included the coefficients of variation as a quantitative measure of the within-lineage heterogeneity in the manuscript (line 97). 

      I also suggest the authors explain what the data points actually represent in this figure - do I assume each data point = cfu from a well of 'granuloma'? Are they all from the same donor PBMC? What is the sample N for each lineage? If the data are not from the same donor PBMC, I think more informative to present the results of paired statistical analyses, stratified by donor cells. In addition, the authors should include a summary table of the demographic characteristics of the donors (at least sex, ethnicity, and age). If the data are derived from a single donor, I'd advocate providing data from at least one further donor.

      In the new supplementary figure requested by Reviewer 3 Figure 1—figure supplement 1 (actual CFU data on days 1 and 8 p.i. used to calculate the growth rate) it is now indicated that bacterial load was quantified as CFU per well.

      Regarding the number of donors used, as stated in the Material and Methods section (current line 418) and depicted by the four different shapes used when data are grouped by individual infecting strain, all figures in our manuscript have been generated using PBMCs from 4 independent donors. For greater clarity, “n = 4” has now been included in the figure legends. Regarding the statistical analyses, paired statistical analyses stratified by donor were already performed in the original version of the manuscript whenever appropriate. 

      As stated in the methods section, the buffy coats used for PBMC isolation are anonymized so demographic data are unavailable.

      The premise of the analysis in Figure tic and the results narrative ("This finding suggests that an increased ability to enter dormancy is not necessarily associated with a more pronounced growth phenotype", line 132) is not clear to me. Why would increased dormancy relate to increased growth in the same context? I suggest this analysis be removed.

      We apologize for the confusion in our original statement. We now rephrased it as “This finding suggests that an increased tendency to remain in a metabolically active state is not necessarily associated with a more pronounced growth phenotype”.

      In Figure 3b, I think it may be more informative if the data points from the same donor were linked. Likewise in Figure 3c, I'd like to see a donor-paired statistical analysis.

      For all figures, the choice of using individual symbols to identify data points from the same donor but not connecting lines was made to provide a neater image. Nevertheless, we have now modified the figure linking the data points from the same donor. The statistical analysis performed is always donor-paired whenever appropriate. 

      The casual inference suggested in the results narrative between ‘macrophage apoptosis’ and granulomatous response line 173-175) is not tested directly by the experiment – I suggest the authors exclude this statement.

      Fair point, the statement has been removed.

      To what extent have the authors considered whether variation in T cell responses between lineages may be confounded by variation in Mtb reactive T cell frequencies in donor PBMC. Can this be disentangled at all? This should be acknowledged as a potential limitation of the study.

      We did characterize the presence of mycobacterial antigen-specific reactive T cells in the PBMCs from the investigated donors. To do so, we performed in vitro stimulations with purified protein derivative (PPD) or an ESAT-6/CFP-10 peptide pool and quantified the frequency of IFN-γ-positive CD4 T cells by flow cytometry. The percentage of IFN-γg-positive CD4 T cells recalled by PPD stimulation ranged from 0.02% to 0.13%, while no ESAT6/CFP-10 reactive T cells were detected. As such, we can akest that the PBMC donors never encountered Mtb even though some levels of memory recalled by PPD may be due to cross-reactivity with BCG or pre-exposure to non-tuberculous mycobacteria. We have now added a panel in Figure 5—figure supplement 2 representing the frequency of mycobacteria-specific CD4 T cells and, as suggested, discussed the impact on the extent of the T cell responses observed in granulomas in the revised version of the manuscript.  Nevertheless, the observed MTBC strain-specific trends are consistent across the donors, as depicted in Figure 5B and Figure 5—figure supplement 2A-B.

      Moreover, the experimental design does not really test cause and effect for the relationship between T cell proliferation/activation and bacterial growth. What is the impact of T-cell depletion from PBMC on bacterial growth?

      The increased TB susceptibility of HIV patients demonstrated that T cells play a critical part in the control of Mtb infection. We agree and did envisage such a depletion experiment. However, depleting T cells from PBMCs would imply removing up to 70% of the cells present in the specimen, which would lead to a situation from which results cannot be compared to the original sample and therefore would not be interpretable. 

      Reviewer #3:

      Data presentation:

      - In Figure 1 (replication rate), actual cumulative CFU means from each strain for both days 1 and 8 with statistical analysis should be presented as panels in this figure.

      Agreed. We are providing the requested representation of the data and the corresponding paired statistical analysis as supplementary material Figure 1—figure supplement 1.

      - In Figure 2 (dormancy), a panel comparing the mean number of bacteria that are single positive for either Auramine-O, Nile Red, or are double positive should be included for each strain, with statistical analysis. Representative photomicrographs of phenotypes from the staining should also be included. Electron microscopy could be conducted to compare the presence of intermediate lipid inclusions within organoidbound mycobacteria.

      As requested, percentages of single stained as well as double positive bacilli in each sample are now represented in Figure 2—figure supplement 1. In addition, we have now also followed the request and included a photomicrograph picturing representative Mtb staining phenotypes. Lastly, it would certainly be very elegant to visualize the presence of Mtb lipid inclusions within cellular aggregates by electron microscopy. However, we do not currently have the means for such investigations and the implementation of such a protocol under BSL3 conditions appears unrealistic in the context of this study.  

      - In Figure 3 (granulomatous response), the number, circularity, and size of immune aggregates are presented as "granuloma score" in which the mean ratio of size to circularity is divided by the number of inclusions. To their credit, in Supplementary Figure 2, the authors provide the data in a straighAorward manner. However, the granuloma score metric is reduced as the number of observed "granulomas" increases, which is counterintuitive. Additionally, circularity is not a definitive aspect of human granulomas (Wells et al., Am J Respir Crit Care Med, 2021, PMID: 34015247). I am skeptical that the "granuloma score" is an accurate predictor granulomatous inflammation. Is there precedent for this metric in the literature? If so, a reference should be provided. A high magnification inset of 1 representative granuloma from each strain should be included in Figure 3A.

      As requested, insets of a representative average granuloma for each strain have been included in Figure 3A. The formulation of the “granuloma score” has no precedent and cannot be referenced. By doing so, we meant to integrate within one single parameter the visual differences represented in the current Figure 3— figure supplement 2. We intentionally sought to assign the highest score to the massive aggregation that some strains may promote unlike some that trigger several small, dispersed and diffused aggregates.

      - In Figure 4 (macrophage apoptosis), a panel showing the percentage of dual Annexin V and 7-AAD positive cells should be included to provide the reader with the relative scope of ongoing apoptotic vs necrotic/secondary necrotic death in the model. If the data is readily available, including a control of uninfected PBMCs would also allow the reader to evaluate donor-dependent differences of in vitro cell death at baseline.

      No significant differences were observed in the percentage of dual Annexin V- and 7-AAD-positive macrophages (necrosis/secondary necrosis) between the MTBC strains at this time-point. Nevertheless, we have disclosed this result in the revised manuscript as Figure 4—figure supplement 2.

      - In Figures 5 and 6 (lymphocyte activation and soluble mediator secretion), panels showing unscaled data should be included. Panels depicting the unscaled immunoassay protein readings (pg/mL) by strain for CXCL9, granzyme B, and TNF with statistical analysis should be included in Figure 6.

      As requested, unscaled lymphocyte activation and soluble mediator data have been included as Figure 5— figure supplement 2 and Figure 6—figure supplement 1, respectively (replacing former supplementary figures 5 and 7). In addition, updated Figure 6G panel now depicts correlation analysis with the unscaled cytokine concentrations.

      The DosR-regulon:

      The authors hypothesize that differences in the prevalence of the dormancy metrics (acid-fastness or lipid inclusion prevalence, are due to strain-specific increases in expression of the DosR regulon within the model's hypoxic conditions (lines 107-114, 126-127). The claim that their model is equipped to evaluate dosR-dependent mycobacterial phenotypes was also previously proposed (Arbués et el., 2021) and should be tested. A comparison of the dosR-dependent gene expression of each strain in PBMC aggregates and broth culture by qRT-PCR would test this idea at a very basic level.

      We agree. Actually, a similar request was made during the revision of our first in vitro granuloma study for which such qPCR data were generated and presented in Fig. 1 D (PMID: 32069329). In addition, the work of Kapoor et al., who originally developed the in vitro granuloma model also demonstrated the induction of most of the DosR regulated genes by qPCR (PMID: 23308269). We trust that the reviewer will agree that this does not need to be repeated.

      The modern Beijing lineage strain L2C:

      The authors claim (Line 101-102) that the results of Figure 1 "confirm the higher virulence propensities of strains from modern lineages". From the data presented, it appears that strain L2C (Modern-Beijing) dominates the modern vs ancestral and inter/intra-lineage phenotypes of replication, dormancy, and apoptosis. Are significant differences between modern and ancestral lineages or between strains simply a facet of the distinct profile of L2C? Do the statistical differences disappear when the L2C group is excluded?

      Indeed, among the modern lineages’ isolates, L2C exhibits a hypervirulent profile in terms of bacterial replication. However, the difference between modern and ancestral strains remains statistically significant when L2C is excluded from the analysis (p = 0.002). That is also the case when we analyze the proportion of dormant bacteria. Exclusion of L2C strain results in a Kruskal-Wallis overall p = 0.005, and p = 0.0002 when we compare L2 vs. L3. Lastly, regarding the percentage of apoptotic macrophages, if we use L2B (instead of L2C) to compare, the difference is still significant vs. L1A (p = 0.008) although there is no longer a trend for L2A (p = 0.1).

      "Dormancy":

      Dormancy is definitively a non-replicative state, where bacterial growth is absent. The authors' findings and claims appear to be incompatible with that definition, which they acknowledge (Lines 130-135). The lack of correlation between growth and dormancy in their model is supported with reference to Figure 2C, a Spearman's analysis of dormancy ratio with growth rate (inclusive of all strains under consideration). The figure supports a model where "dormancy" and "growth rate" are disjunct but also appears to show high "dormancy" accompanying increasing "growth" in the L2C group. How are strains able to grow if they are in a non-replicative state? Are the "growth rate" assays actually measures of survival? Are there different rates of infectivity? Are the bacteria growing cellularly in the serum-rich ECM, etc. etc? We need to see the hard CFU and Nile Red, and Auramine-O data to contextualize these findings. Alternatively, could the accumulation of inclusions in the model not be a reliable dormancy metric (Fines et al., BioRxiv [Preprint], 2023, PMID: 37609245)?

      We fully agree. The Nile red profiles are always relative and only depict the proportion of the population that has entered a dormant state. Nevertheless, dormancy can be dynamic and bacteria may swi]ly resuscitate in that model. Furthermore, and as depicted in Figure 2—figure supplement 1, despite showing an increased tendency to enter a dormant-like state, a considerable population of lineage 2 bacilli still remains metabolically active and in a replicative state. The referred preprint is very interesting and we will follow it up closely.

      Specificity of responses to PBMC aggregation:

      The authors claim that their results "reveal a broad spectrum of granulomatous responses" (Line 73) but do not show any aggregation specificity of PBMC responses beyond the model's intrinsic metrics of area and circularity. To establish that their phenotypes such as lymphocyte activation, cytokine release, cell death, or mycobacterial acid-fastness/lipid inclusion prevalence, are aspects of the granulomatous response the authors could infect PBMCs from the same donors with the same strains and perform the same assays using established Mtb-PBMC models in which the cells do not aggregate. This would answer many important questions, for example, does the rate of macrophage infection account for variability in apoptosis percentage? Phagocytosis assay and quantification of stained intracellular mycobacteria within recently infected PBMCs could be conducted to determine if phenotypes are an aspect of granulomatous aggregation or due to strain-specific differences in cellintrinsic macrophage immunity. It would also be very informative to know what percentage of PBMCs and mycobacteria are granuloma-bound in the ECM.

      We are not aware of Mtb-PBMC models in which the cells do not aggregate. We previously compared PBMC infection models in the presence or absence of the collagen matrix and cells also spontaneously coalesced around infection foci (PMID: 34603299). Regarding the last point, the melting step of the collagen matrix requires enzymatic digestion and pipetting that dislocate the aggregates. Accordingly, we cannot distinguish the bacteria that would remain within the matrix compared to those replicating within cellular aggregates. However, we did resolve this question by demonstrating that the bacteria were not able to grow in the absence of cells in this culture condition (Supplementary material, PMID: 34603299)

      Minor recommendations

      - The term TNF-a should be replaced with TNF throughout the manuscript.

      We acknowledge that the term TNF-a can be interchangeable with TNF. However, we chose to use the TNFα terminology to differentiate it from lymphotoxin α, which is also referred to as TNF-β.

      - The authors cite studies conducted in murine and NHP models to support the claim that "understanding of immune protective traits in TB remains insufficient and yet dominated by data from mouse and non-human primate studies" (Lines 63-64) but ignore an abundance of data from other in vivo and in vitro models that have provided numerous valuable insights in the field of TB immunology. This line should be revised or omired.

      For us, the term “dominate” implies that these models are widely used, not that they are the only ones. Other models indeed provided additional relevant data. We are citing the lung-on-chip model of McKinney’lab and the in vitro granuloma model of Elkigton’s lab (line 66). We would be very happy to include more references upon further specifications even though we cannot build an extensive review here.

      - The authors claim that their model "encompasses, with the exception of neutrophils, all immune cell types involved in TB" (Lines 67-68). To support this claim, they should provide additional references or data demonstrating that the PBMC aggregates include, eosinophils, mast cells, dendritic cells, yolk-sac-derived alveolar macrophages, and Langhan's giant cells.

      With the aim of providing a more accurate and detailed information regarding the cell types present in the model, the sentence has been reformulated as: “The model encompasses all PBMC-derived cell types involved in TB immune responses, but lacks granulocytes (i.e. neutrophils, eosinophils, basophils and mast cells)” (line 260). Noteworthy, the presence of multinucleated giant cells was reported in Kapoor’s paper describing the in vitro granuloma model for the first time (PMID: 23308269).

      -  As an additional note, the title can be improved and made more broadly accessible by revising the use of the acronyms CXCL9, granzyme B, and TNF-α.

      To render the title more broadly accessible we propose to replace the listed acronyms by “soluble immune mediators”, but we remain opened to more appropriate and specific suggestions.

      Answers to the reviewers’ public comments

      Reviewer #1:

      First of all, we would like to thank the reviewers for their feedback and suggestions to improve our manuscript. To strengthen the findings of our study, we have performed and added results from IL-1b and CXCL9 blocking experiments evaluating the impact on the granulomatous response and bacterial load, respectively. In the revised version of the manuscript, while we discuss the null effect on bacterial growth of the treatment with an anti-CXCL-9 antibody and the potential reason behind it, we are now reporting a negative effect on the magnitude of granuloma formation upon neutralization of IL-1b that the correlation analysis had initially suggested.

      Reviewer #2:

      The revised version of our manuscript incorporates now all the points detailed in the private answers to the reviewer, including clarifications on the statistical tests performed, additional supplementary materials to transparently disclose the raw data behind the normalization approach, as well as flow cytometry data on the immune memory status of the blood donors. In addition, and as stated in the answer to reviewer #1, to test causal relationship between some host and pathogen traits, we have now performed and provided data and interpretation of IL-1b and CXCL9 blocking experiments.

      Reviewer #3:

      We are thankful and concur with these constructive comments and insights. We have now consistently revisited the statistics in the figures to improve clarity and included new supplementary figures reporting the raw data that were missing in the initial version of the manuscript. In addition, and as mentioned in the answers to reviewers #1 and #2, we have now performed and added IL-1β and CXCL9 blocking experiments to test causal relationship between specific host and pathogen traits. In particular, we are now reporting a negative effect on the magnitude of granuloma formation upon neutralization of IL-1β that the correlation analysis had initially suggested.

      More specifically, regarding the point that our method for bacterial collection calls into question whether all Mtb plated for CFU assay resided within granulomatous aggregates, we previously reported that Mtb growth strictly required the presence of human cells in our culture conditions (Supplementary material, Arbués et al, 2021, PMID: 34603299). In the presence of cells, our microscopy read-out does allow us to observe extra-cellular growth if infections are carried on beyond an 8-day limit, which we applied in the current study to exclude this particular caveat. 

      Concerning the apparently conflicting observation that those strains displaying an increased tendency to enter a dormant-like state are the ones exhibiting the highest replication rates, we would like to point out that a considerable population of bacilli still remains metabolically active and in a replicative state. For instance, and as depicted in Figure 2—figure supplement 1, despite showing an increased tendency to enter a dormant-like state, a considerable population of lineage 2 bacilli does remain metabolically active. Moreover, dormancy can be dynamic and bacteria may swi]ly resuscitate.

      Regarding the mentioned limitations of our study that we have discussed in the revised version of our manuscript, we fully concur that PBMC-based in vitro granuloma models lack tissue structure as well as some important stromal and immune cellular players. Nevertheless, we and others demonstrated the particular relevance of the 3-dimensional infection approach within a matrix of collagen and fibronectin by providing mechanistical insights into Mtb resuscitation previously associated to treatment with various immunomodulatory drugs (Arbués et al., 2020, PMID: 32069329; Tezera et al., 2020, PMID: 32091388).

    1. eLife Assessment

      This manuscript describes the impact of modulating signaling by a key regulatory enzyme, Dual Leucine Zipper Kinase (DLK), on hippocampal neurons. The results are interesting and will be important for scientists interested in synapse formation, axon specification, and cell death. The authors have carefully addressed the comments made by the reviewers and the findings are convincing in large part due to the use of extensive mouse genetics, detailed gene expression of enriched genes, and recognition of neuron vulnerability.

    2. Reviewer #1 (Public review):

      Summary:

      In this work Ritchie and colleagues explore functional consequences of neuronal over-expression or deletion of the MAP3K DLK that their labs and others have strongly implicated in both axon degeneration, neuronal cell death, and axon regeneration. Their recent work in eLife (Li, 2021) showed that inducible over-expression of DLK (or the related LZK) induces neuronal death in the cerebellum. Here, they extend this work to show that inducible over-expression in Vglut1+ neuron also kills excitatory neurons in hippocampal CA1, but not CA3. They complement this very interesting finding with translatomics to quantify genes whose mRNAs are differentially translated in the context of DLK over-expression or knockout, the latter manipulation having little to no effect on the phenotypes measured. The authors note that several genes and pathways are differentially regulated according to whether DLK is over-expressed or knocked out. They note DLK-dependent changes in genes related to synaptic function and to the cytoskeleton and ultimately relate this in cultured neurons to findings that DLK over-expression negatively impacts synapse number and changes microtubules and neurites, though with a less obvious correlation.

      Strengths:

      Where this work represents a conceptual advance is in defining DLK-dependent changes in translation. Moreover, the finding that DLK may differentially impact neuronal death will become the basis for future studies exploring whether DLK contributes to differential neuronal susceptibility to death, which is a broadly important topic.

      Comments on the latest version:

      The addition of the P10 data is an important advance. With this, the authors have satisfactorily addressed the concerns that I raised.

    3. Reviewer #2 (Public review):

      This manuscript describes the impact of deleting or enhancing the expression of the neuronal-specific kinase DLK in glutamatergic hippocampal neurons using clever genetic strategies, which demonstrates that DLK deletion had minimal effects while overexpression resulted in neurodegeneration in vivo. To determine the molecular mechanisms underlying this effect, ribotag mice were used to determine changes in active translation which identified Jun and STMN4 as DLK-dependent genes that may contribute to this effect. Finally, experiments in cultured neurons were conducted to better understand the in vivo effects. These experiments demonstrated that DLK overexpression resulted in morphological and synaptic abnormalities.

      Strengths:

      This study provides interesting new insights into the role of DLK in the normal function of hippocampal neurons. Specifically, the study identifies:

      (1) CA1 vs CA3 hippocampal neurons have differing sensitivity to increased DLK signaling.

      (2) DLK-dependent signaling in these neurons is similar to but distinct from the downstream factors identified in other cell types, highlighted by the identification of STMN4 as a downstream signal.

      (3) DLK overexpression in hippocampal neurons results in signaling that is similar to that induced by neuronal injury.

      The study also provides confirmatory evidence that supports previously published work through orthogonal methods, which adds additional confidence to our understanding of DLK signaling in neurons. Taken together, this is a useful addition to our understanding of DLK function.

      Comments on the latest version:

      The authors have sufficiently addressed all issues raised with the initial manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This manuscript describes the impact of modulating signaling by a key regulatory enzyme, Dual Leucine Zipper Kinase (DLK), on hippocampal neurons. The results are interesting and will be important for scientists interested in synapse formation, axon specification, and cell death. The methods and interpretation of the data are solid, but the study can be further strengthened with some additional studies and controls.

      We greatly appreciate the thorough review and thoughtful suggestions from the reviewers and editors on our original manuscript. We provide point-to-point response below.  We added new studies on P10 mice and controls as suggested, and made revision of figures and texts for clarification. The revised manuscript includes three new supplemental figures; major text revision is copied under response.

      Reviewer #1 (Public Review):

      Summary:

      In this work, Ritchie and colleagues explore functional consequences of neuronal over-expression or deletion of the MAP3K DLK that their labs and others have strongly implicated in both axon degeneration, neuronal cell death, and axon regeneration. Their recent work in eLife (Li, 2021) showed that inducible over-expression of DLK (or the related LZK) induces neuronal death in the cerebellum. Here, they extend this work to show that inducible over-expression in Vglut1+ neurons also kills excitatory neurons in hippocampal CA1, but not CA3. They complement this very interesting finding with translatomics to quantify genes whose mRNAs are differentially translated in the context of DLK over-expression or knockout, the latter manipulation having little to no effect on the phenotypes measured. The authors note that several genes and pathways are differentially regulated according to whether DLK is over-expressed or knocked out. They note DLK-dependent changes in genes related to synaptic function and the cytoskeleton and ultimately relate this in cultured neurons to findings that DLK over-expression negatively impacts synapse number and changes microtubules and neurites, though with a less obvious correlation.

      Strengths:

      This work represents a conceptual advance in defining DLK-dependent changes in translation. Moreover, the finding that DLK may differentially impact neuronal death will become the basis for future studies exploring whether DLK contributes to differential neuronal susceptibility to death, which is a broadly important topic.

      We thank the reviewer for the comments on the value of our work.

      Weaknesses:

      This seems like two works in parallel that the authors have not yet connected. First is that DLK affects the translation of an interesting set of genes, and second, that DLK(OE) kills some neurons, disrupts their synapses, and affects neurite growth in culture.

      Specific questions:

      (1) Is DLK effectively knocked out? The authors reference the floxxed allele in their 2016 work (PMID: 27511108), however, the methods of this paper say that the mouse will be characterized in a future publication. Has this ever been published? The major concern is that here the authors show that Cre-mediated deletion results in a smaller molecular weight protein and the maintenance of mRNA levels.

      We apologize for out-of-date citation of the DLK(cKO)<sup>fl/fl</sup> mice.  The DLK(cKO)<sup>fl/fl</sup> mice have been published in (Li et al., 2021; Saikia et al., 2022); excision of the flox-ed exon was verified using several Cre drivers (Pv-Cre, AAV-Cre, and VGlut1-Cre in this study).  The flox-ed exon contains the initiation ATG and 148 amino acids.  By western blot analysis using antibodies against C-terminal peptides of DLK on cerebellar extracts (in Li et al., 2021) and hippocampal extracts (this study), the full-length DLK protein was significantly reduced (Fig 1A-B); DLK is expressed in other hippocampal cells, in addition to glutamatergic neurons, explaining remaining full-length DLK detected. 

      Our Ribo-seq of VGlut1-Cre; DLK(cKO)<sup>fl/fl</sup> detected remaining Dlk mRNAs lacking the floxed exon (Fig.S1C), which has several candidate ATG at amino acid 223 and after (Fig.S1C1). We detected a very faint band for smaller molecular weight proteins on western blots, only when the membrane was exposed under 5X longer exposure using Pico PLUS Chemiluminescent Substrate (Thermo Scientific, 34580) and a Licor Odyssey XF Imager (revised Fig. S1B). This smaller molecular weight protein might be produced using any candidate ATGs, but would represent an N-terminal truncated DLK protein lacking the ATP binding site and ~1/4 of the kinase domain, i.e. not a functional kinase. 

      The revised manuscript has updated citation for DLK(cKO)<sup>fl/fl</sup>. Revised Fig.S1B includes images of a western blot under normal exposure vs longer exposure of western blots using anti-DLK antibodies. New Fig.S1C1 shows effects of floxed exon on DLK.

      (2) Why does DLK(OE) not kill CA3 neurons? The phenomenon is clear but there is no link to gene expression changes. In fact, the highlighted transcript in this work, Stmn4, changes in a DLK-dependent manner in CA3.

      We agree that this is a very interesting question not answered by our gene expression analysis.  While we verified Stmn4 expression levels to correlate to the levels of DLK, we do not think that increased Stmn4 per se in DLK(iOE) is a major factor accounting for CA1 death vs CA3 survival. Several published studies have also reported regulation of Stmn4 mRNAs in other cell types, in the contexts of cell death (Watkins et al., 2013; Le Pichon et al., 2017) and axon regeneration and cytoskeleton disruption (Asghari Adib et al., 2024; DeVault et al., 2024; Hu et al., 2019;  Shin et al., 2019). As Stmns have significant expression and function redundancy, conventional knockdown or overexpression of individual Stmn generally does not lead to detectable effects on cellular function. As CA3 neurons are widely known for their dense connections and show resilience to NMDA-mediated neurotoxicity (Sammons et al., 2024; Vornov et al., 1991), we speculate that the differential vulnerability of CA1 and CA3 under DLK(iOE) is a reflection of both the intrinsic property, such as gene expression, and also their circuit connection. 

      In the revised manuscript, we have included following statement on pg 18:

      ‘While our data does not pinpoint the molecular changes explaining why CA3 would show less vulnerability to increased DLK, we may speculate that DLK(iOE) induced signal transduction amplification may differ in CA1 vs CA3. CA1 genes appear to be more strongly regulated than CA3 genes, consistent with our observation that increased c-Jun expression in CA1 is greater than that in CA3. Other parallel molecular factors may also contribute to resilience of CA3 neurons to DLK(iOE), such as HSP70 chaperones, different JNK isoforms, and phosphatases, some of which showed differential expression in our RiboTag analysis of DLK(iOE) vs WT (shown in File S2. WT vs DLK(iOE) DEGs). Together with other genes that show dependency on DLK, the DLK and Jun regulatory network contributes to the regional differences in hippocampal neuronal vulnerability under pathological conditions.’

      Further we state in ‘Limitation of our study’ on pg 20:

      ‘Our analysis also does not directly address why CA3 neurons are less vulnerable to increased DLK expression. Future studies using cell-type specific RiboTag profiling and other methods at a refined time window will be required to address how DLK dependent signaling interacts with other networks underlying hippocampal regional neuron vulnerability to pathological insults.’

      We hope our data will stimulate continued interests for testable hypothesis in future studies.

      (3) Why are whole hippocampi analyzed to IP ribosome-associated mRNAs? The authors nicely show a differential effect of DLK on CA1 vs CA3, but then - at least according to their methods ¬- lyse whole hippocampi to perform IP/sequencing. Their data are therefore a mix of cells where DLK does and does not change cell death. The key issue is whether DLK does/does not have an effect based on the expression changes it drives.

      At the time of planning the Ribo-Tag experiment several years ago, we focused on the hippocampal glutamatergic neurons. Due to technical difficulty in micro-dissecting individual hippocampal regions from this early timepoint, we opted to use whole hippocampi to isolate ribosome-associated mRNAs. We agree with the reviewer that it is important to sort out DLK-dependent general gene expression changes vs those specific to a particular cell type where DLK impacts its survival. With emerging CA1, CA3 and other cell-type specific Cre drivers and advanced RNAseq technology, we hope that our work will stimulate broad interest in these questions in future studies. 

      In the revised manuscript, we have included new analysis comparing our Vglut1-RiboTag profiling (P15) with CamK2-RiboTag (for CA1) and Grik4-RiboTag (for CA3) (P42) published in Traunmüller et al., 2023 (GSE209870). We find that >80% of the top ranked genes in their CamK2-RiboTag (for CA1) and Girk4-RiboTag (for CA3) were detected in our VGlut1-RiboTag (revised methods and Supplemental Excel File S3). CA1-enriched genes tended to be expressed higher in DLK(cKO), compared to control, whereas CA3-enriched genes showed less significant correlation to DLK expression levels. Additionally, many genes known to specify CA1 fate do not show significant downregulation in DLK(iOE). This analysis, along with other data in our manuscript, is consistent with an idea that DLK does not regulate neuronal fate.

      In the revised manuscript, we presented this additional analysis in Fig. S6K-L, and expanded text description on page 9:

      ‘Additionally, we compared our Vglut1-RiboTag datasets with CamK2-RiboTag and Grik4-RiboTag datasets from 6-week-old wild type mice reported by (Traunmüller et al., 2023; GSE209870). We defined a list of genes enriched in CamK2-expressing CA1 neurons relative to Grik4-expressing CA3 neurons (CA1 genes), and those enriched in Grik4-expressing CA3 neurons (CA3 genes) (File S3). When compared with the entire list of Vglut1-RiboTag profiling in our control and DLK(cKO), we found CA1 genes tended to be expressed more in DLK(cKO) mice, compared to control (Fig.S6K), while CA3 genes showed a slight enrichment in control though the trend was less significant, and were less clustered towards one genotype (Fig.S6L). Moreover, many CA1 genes related to cell-type specification, such as FoxP1, Satb2, Wfs1, Gpr161, Adcy8, Ndst3, Chrna5, Ldb2, Ptpru, and Ntm, did not show significant downregulation when DLK was overexpressed. These observations imply that DLK likely specifically down-regulates CA1 genes both under normal conditions and when overexpressed, with a stronger effect on CA1 genes, compared to CA3 genes. Overall, the informatic analysis suggests that decreased expression of CA1 enriched genes may contribute to CA1 neuron vulnerability to elevated DLK, although it is also possible that the observed down-regulation of these genes is a secondary effect associated with CA1 neuron degeneration’.

      (4) Is the subtle decrease in synapse number (Basson/Homer co-loc.) in the DLK (OE) simply a function of neurons (and their synapses, presumably) having died? At the P15 time point that the authors choose because cell death is minimal, there is still a ~25% reduction in CA1 thickness (Figure 2B), which is larger than the ~15% change in synapses (Figure 5H) they describe.

      We thank reviewer for the question. To address this, we have analyzed synapses in the CA1 region at P10 in DLK(iOE) mice when there was no detectable loss of neurons. At P10, we did not detect significant changes in Bassoon, Homer1, or colocalized puncta in CA1 (Fig.S11A-F). In P15 DLK(iOE) mice, Homer1 puncta were slightly smaller (Fig.5L) and showed a significant decrease in CA1 SR (Fig.5I).

      In the revised manuscript we have also redone our statistical analysis of synapses, using mice rather than ROIs (revised Fig. 5), as recommended by R3. We also analyzed synapses in CA3, and found no significant differences in P10 or P15 (Fig.S12).  We would interpret the data to mean that the effects of DLK(OE) on synapses in CA1 may represent an early step in neuronal death. We hope that future studies will shed clarity on this question.

      Reviewer #2 (Public Review):

      This manuscript describes the impact of deleting or enhancing the expression of the neuronal-specific kinase DLK in glutamatergic hippocampal neurons using clever genetic strategies, which demonstrates that DLK deletion had minimal effects while overexpression resulted in neurodegeneration in vivo. To determine the molecular mechanisms underlying this effect, ribotag mice were used to determine changes in active translation which identified Jun and STMN4 as DLK-dependent genes that may contribute to this effect. Finally, experiments in cultured neurons were conducted to better understand the in vivo effects. These experiments demonstrated that DLK overexpression resulted in morphological and synaptic abnormalities.

      Strengths:

      This study provides interesting new insights into the role of DLK in the normal function of hippocampal neurons. Specifically, the study identifies:

      (1) CA1 vs CA3 hippocampal neurons have differing sensitivity to increased DLK signaling.

      (2) DLK-dependent signaling in these neurons is similar to but distinct from the downstream factors identified in other cell types, highlighted by the identification of STMN4 as a downstream signal.

      (3) DLK overexpression in hippocampal neurons results in signaling that is similar to that induced by neuronal injury.

      The study also provides confirmatory evidence that supports previously published work through orthogonal methods, which adds additional confidence to our understanding of DLK signaling in neurons. Taken together, this is a useful addition to our understanding of DLK function.

      We thank the reviewer for careful reading and positive comments.

      Weaknesses:

      There are a few weaknesses that limit the impact of this manuscript, most of which are pointed out by the authors in the discussion. Namely:

      (1) It is difficult to distinguish whether the changes in the translatome identified by the authors are DLK-dependent transcriptional changes, DLK-dependent post-transcriptional changes or secondary gene expression changes that occur as a result of the neurodegeneration that occurs in vivo. Additional expression analysis at earlier time points could be one method to address this concern.

      We appreciate the reviewer’s comment, and have performed new analysis on c-Jun and p-c-Jun levels in CA1, CA3, and DG in P10 DLK(OE) mice. Our data suggest that in CA3 elevations in p-c-Jun and c-Jun occur separately from cell death in a DLK-dependent manner, though the high elevation of both p-c-Jun and c-Jun in CA1 correlates with cell death.

      The data is presented in revised Fig.S7A,B, and described in revised text on pg 9-10:

      ‘In control mice, glutamatergic neurons in CA1 had low but detectable c-Jun immunostaining at P10 and P15, but reduced intensity at P60; those in CA3 showed an overall low level of c-Jun immunostaining at P10, P15 and P60; and those in DG showed a low level of c-Jun immunostaining at P10 and P15, and an increased intensity at P60 (Fig.S7A,C,E). In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice at P10 when no discernable neuron degeneration was seen in any regions of hippocampus, only CA3 neurons showed a significant increase of immunostaining intensity of c-Jun, compared to control (Fig.S7A). In P15 mice, we observed further increased immunostaining intensity of c-Jun in CA1, CA3, and DG, with the strongest increase (~4-fold) in CA1, compared to age-matched control mice (Fig.S7C). The overall increased c-Jun staining is consistent with RiboTag analysis.’

      Also, on pg.10:

      In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice, we observed increased p-c-Jun positive nuclei in CA1 at P10, and strong increase in CA1 (~10-fold), CA3 (~6-fold), and DG (~8-fold) at P15 (Fig.S7B,D).

      (2) Related to the above, it is difficult to conclusively determine from the current data whether the changes in synaptic proteins observed in vivo are a secondary result of neuronal degeneration or a primary impact on synapse formation. The in vitro studies suggest this has the potential to be a primary effect, though the difference in experimental paradigm makes it impossible to determine whether the same mechanisms are present in vitro and in vivo.

      We appreciate the comment, which is related to R1 point 4. We have performed further analysis and revised the text on pg.12 with the following text:

      ‘To assess effects of DLK overexpression on synapses, we immunostained hippocampal sections from both P10 and P15, with age-matched littermate controls. Quantification of Bassoon and Homer1 immunostaining revealed no significant differences in CA1 SR and CA3 SR and SL in P10 mice of _<_i>Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> and control (Fig.S11A-F, S12A-J). In P15, Bassoon density and size in CA1 SR were comparable in both mice (Fig 5G, H, K), while Homer1 density and size were reduced in DLK(iOE) (Fig.5G,I, L). Overall synapse number in CA1 SR was similar in DLK(iOE) and control mice (Fig.5J). Similar analysis on CA3 SR and SL detected no significant difference from control (Fig.S12M-V).’

      We would interpret the data to mean that the effects of DLK(OE) on synapses in CA1 may represent an early step in neuronal death. We hope that future studies will shed clarity on this question.

      Additionally, to address whether the same mechanisms are present in vitro, we have performed further analysis on cultured hippocampal neurons. As described in the Methods, we made hippocampal neuron cultures from P1 pups of the following crosses:

      For control: Vglut1<sup>Cre/+</sup> X Rosa26<sup>tdT/+</sup> 

      For DLKcKO: Vglut1<sup>Cre/+</sup>;DLK(cKO)<sup>fl/fl</sup>  X Vglut1<sup>Cre/+</sup>;DLK(cKO)<sup>fl/fl</sup>;Rosa26<sup>tdT/+</sup> 

      For DLKiOE: H11-DLK<sup>iOE/iOE</sup> X Vglut1<sup>Cre/+</sup>;Rosa26<sup>tdT/+</sup> 

      Dissociated cells from a given litter were pooled into the same culture. Because there were different proportions of neurons with our genotype of interest in each culture, it is not simple to know whether DLK was causing significant cell death.

      On pg 13, we stated our observation:

      ‘We did not notice an obvious effect of DLK(iOE) or DLK(cKO) on neuron density in cultures at DIV2. To assess neuronal type distribution in our cultures, we immunostained DIV14 neurons with antibodies for Satb2, as a CA1 marker (Nielsen et al., 2010), and Prox1, as a marker of DG neurons (Iwano et al., 2012). We did not observe significant differences in the proportion of cells labeled with each marker in DLK(cKO) or DLK(iOE) cultures (Fig.S13E). These data are consistent with the idea that DLK signaling does not have a strong role in neuron-type specification both in vivo and in vitro’.

      (3) The phenotype of DLK cKO mice is very subtle (consistent with previous reports) and while the outcome of increased DLK levels is interesting, the relevance to physiological DLK signaling is less clear. What does seem possible is that increased DLK may phenocopy other neuronal injuries but there are no real comparisons to directly address this in the manuscript. It would be helpful for the authors to provide this analysis as well as a table with all of the translational changes along with fold changes.

      Thank you for the suggestion. The fold changes of genes showing significantly altered expression in DLK(cKO) and DLK(iOE) are provided in the excel files (Supplementary excel File S1 WT vs DLK(cKO) DEGs and File S2. WT vs DLK(iOE) DEGs, highlighted columns B and F).  

      On pg 6, we revised the text as following to include comparison of DLK levels in other physiological conditions and our mice:

      ‘Several studies have reported that DLK protein levels increase under a variety of conditions, including optic nerve crush (Watkins et al., 2013), NGF withdrawal (~2 fold) (Huntwork-Rodriguez et al., 2013; Larhammar et al., 2017), and sciatic nerve injury (Larhammar et al., 2017). Induced human neurons show increased DLK abundance about ~4 fold in response to ApoE4 treatment (Huang et al., 2019). Increased expression of DLK can lead to its activation through dimerization and autophosphorylation (Nihalani et al., 2000)’.

      And,

      ‘Additional analysis at the mRNA level (supplemental excel, File S2. WT vs DLK(iOE) DEGs) and at the protein level (Fig.S8E) suggest that the increase in DLK abundance was around 3 times the control level. The localization patterns of DLK protein appeared to vary depending on region of hippocampus and age of animals in both control and Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice (Fig.S3C).’

      In Discussion, we state (pg. 16): ‘The levels of DLK in our DLK(iOE) mice model appear comparable to those reported under traumatic injury and chronic stress.’

      (4) For the in vivo experiments, it is unclear whether multiple sections from each animal were quantified for each condition. More information here would be helpful and it is important that any quantification takes multiple sections from each animal into account to account for natural variability.

      We apologize this was unclear in the original manuscript.

      In the revised methods, under Confocal imaging and quantification (pg 33), we stated: “For brain tissue, three sections per mouse were imaged with a minimum of three mice per genotype for data analysis.”

      In revised figure legends, we made it clear that multiple sections from each animal have been used for quantification in all instances, i.e. “Each dot represents averaged thickness from 3 sections per mouse, N≥4 mice/genotype per timepoint.” 

      In Fig.1F-H: “Each dot represents averaged intensity from 3 sections per mouse”

      In Fig.S3B “Data points represent individual mice, averages taken across 3 sections per mouse”

      Reviewer #3 (Public Review):

      Dr Jin and colleagues revisit DLK and its established multifactorial roles in neuronal development, axonal injury, and neurodegeneration. The ambitious aim here is to understand the DLK-dependent gene network in the brain and, to pursue this, they explore the role of DLK in hippocampal glutamatergic neurons using conditional knockout and induced overexpression mice. They produce evidence that dorsal CA1 and dentate gyrus neurons are vulnerable to elevated expression of DLK, while CA3 neurons appear unaffected. Then they identify the DLK-dependent translatome featured by conserved molecular signatures and cell-type specificity. Their evidence suggests that increased DLK signaling is associated with possible STMN4 disruptions to microtubules, among else. They also produce evidence on cultured hippocampal neurons showing that expression levels of DLK are associated with changes in neurite outgrowth, axon specification, and synapse formation. They posit that downstream translational events related to DLK signaling in hippocampal glutamatergic neurons are a generalizable paradigm for understanding neurodegenerative diseases.

      Strengths

      This is an interesting paper based on a lot of work and a high number of diverse experiments that point to the pervasive roles of DLK in the development of select glutamatergic hippocampal neurons. One should applaud the authors for their work in constructing sophisticated molecular cre-lox tools and their expert Ribotag analysis, as well as technical skill and scholarly treatment of the literature. I am somewhat more skeptical of interpretations and conclusions on spatial anatomical selectivity without stereological approaches and also going directly from (extremely complex) Ribotag profiling patterns to relevance based on immunohistochemistry and no additional interventions to manipulate (e.g. by knocking down or blocking) their top Ribotag profile hits. Also, it seems to this reviewer that major developmental claims in the paper are based on gene translational profiling dependent on DLK expression, not DLK activation, despite some evidence in the paper that there is a correlation between the two. Therefore, observed patterns and correlations may or may not be physiologically or pathologically relevant. Generalizability to neurodegenerative diseases is an overreach not justified by the scope, approach, and findings of the paper.

      We thank the reviewer for the encouraging and constructive comments on the manuscript.

      Weaknesses and Suggestions:

      The authors state that the rationale for the translatomic studies is to "to gain molecular understanding of gene expression associated with DLK in glutamatergic neurons" and to characterize the "DLK-dependent molecular and cellular network", However, a problem with the experimental design is the selection of an anatomical region at a time point featured by active neurodegeneration. Therefore, it is not straightforward that the differentially expressed genes or pathways caused by DLK overexpression changes could be due to processes related to neurodegeneration. Indeed, the authors find enrichment of signals related to pathways involved in extracellular matrix organization, apoptosis, unfolded protein responses, the complement cascade, DNA damage responses, and depletion of signals related to mitochondrial electron transport, etc., all of which could be the consequence of neurodegeneration regardless of cause. A more appropriate design to discover DLK-dependent pathways might be to look at a region and/or a time point that is not confounded by neurodegeneration.

      We appreciate reviewer’s comment. We included our thoughts in ‘Limitation of the study’ (pg 20):

      ‘Future studies using cell-type specific RiboTag profiling and other methods at a refined time window will be required to address how DLK dependent signaling interacts with other networks underlying hippocampal regional neuron vulnerability to pathological insults.’

      In a related vein, the authors ask "if the differentially expressed genes associated with DLK(iOE) might show correlation to neuronal vulnerability" and, to answer this question, they select the set of differentially expressed genes after DLK overexpression and assess their expression patterns in various regions under normal conditions. It looks to me that this selection is already confounded by neurodegeneration which could be the cause for their downregulation. Therefore, such gene profiles may not be directly linked to neuronal vulnerability. A similar issue also relates to the conclusion that "...the enrichment of DLK-dependent translation of genes in CA1 suggests that the decreased expression of these genes may contribute to CA1 neuron vulnerability to elevated DLK".

      We agree with the reviewer’s concern that it is difficult to separate neurodegenerative consequences from changes caused by DLK solely based on our translatomics studies on P15 DLK(iOE) mice.  As responded to reviewer 1 (point 4) and reviewer 2 (point 1), we have included new analysis of P10 mice (Fig.S7A,B) when neurons did not show detectable sign of degeneration.

      We consider several lines of evidence supporting that some differentially expressed genes in DLK(iOE) vs control may likely be specific for increased DLK signaling.

      First, the genes identified in DLK(iOE) vs control represent a small set of genes (260), which is comparable to other DLK dependent datasets (Asghari Adib et al., 2024) but shows cell-type specificity.

      Second, our analysis using rank-rank hypergeometric overlap (RRHO) detects a significant correlation between upregulated genes from DLK(iOE) vs downregulated genes in DLK(cKO), and vice versa, suggesting that expression of a similar set of genes is depended on DLK (Fig.3C, S6C-E). Consistently, GO term analysis using the list of genes coordinately regulated by DLK, derived from our RRHO analysis, leads to identification of similar GO terms related to up- and downregulated genes as using DLK(iOE)-RiboTag data alone. SynGO analysis of DLK(iOE) regulated genes and DLK(cKO) regulated genes also identified similar synaptic processes regulated by significantly regulated genes (Fig.3F and S6J).  

      Third, we performed additional analysis comparing our Vglut1-RiboTag dataset with CamK2-RiboTag and Grik4-RiboTag datasets from 6-week-old wild type mice reported by (Traunmüller et al., 2023; GSE209870). We observed >80% overlap among the top ranked genes (revised Methods). We described this analysis on pg 9 and Fig. S6K-L (and Supplemental Excel File S3):

      ‘Additionally, we compared our Vglut1-RiboTag datasets with CamK2-RiboTag and Grik4-RiboTag datasets from 6-week-old wild type mice reported by (Traunmüller et al., 2023; GSE209870). We defined a list of genes enriched in CamK2-expressing CA1 neurons relative to Grik4-expressing CA3 neurons (CA1 genes), and those enriched in Grik4-expressing CA3 neurons (CA3 genes) (File S3). When compared with the entire list of Vglut1-RiboTag profiling in our control and DLK(cKO), we found CA1 genes tended to be expressed more in DLK(cKO) mice, compared to control (Fig.S6K), while CA3 genes showed a slight enrichment in control though the trend was less significant, and were less clustered towards one genotype (Fig.S6L). Moreover, many CA1 genes related to cell-type specification, such as FoxP1, Satb2, Wfs1, Gpr161, Adcy8, Ndst3, Chrna5, Ldb2, Ptpru, and Ntm, did not show significant downregulation when DLK was overexpressed. These observations imply that DLK likely specifically down-regulates CA1 genes both under normal conditions and when overexpressed, with a stronger effect on CA1 genes, compared to CA3 genes. Overall, the informatic analysis suggests that decreased expression of CA1 enriched genes may contribute to CA1 neuron vulnerability to elevated DLK, although it is also possible that the observed down-regulation of these genes is a secondary effect associated with CA1 neuron degeneration.’

      To understand the role and relevance of the DLK overexpression model, there should be a discussion of to what extent it corresponds to endogenous levels of DLK expression or DLK-MAPK pathway activation under baseline or pathological conditions.

      We appreciate the suggestion, which is similar to R2 point 3. We have revised the text and discussion to include how DLK levels may be altered in other physiological conditions vs our mice.

      Pg. 6: ‘Several studies have reported that DLK protein levels increase under a variety of conditions, including optic nerve crush (Watkins et al., 2013), NGF withdrawal (~2 fold) (Huntwork-Rodriguez et al., 2013; Larhammar et al., 2017), and sciatic nerve injury (Larhammar et al., 2017). Induced human neurons show increased DLK abundance about ~4 fold in response to ApoE4 treatment (Huang et al., 2019). Increased expression of DLK can lead to its activation through dimerization and autophosphorylation (Nihalani et al., 2000)’.

      And,

      ‘Additional analysis at the mRNA level (supplemental excel, File S2. WT vs DLK(iOE) DEGs) and at the protein level (Fig.S8E) suggest that the increase in DLK abundance was around 3 times the control level. The localization patterns of DLK protein appeared to vary depending on region of hippocampus and age of animals in both control and Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice (Fig.S3C).’

      In Discussion (pg. 16): ‘The levels of DLK in our DLK(iOE) mice model appear comparable to those reported under traumatic injury and chronic stress.’

      The authors posit that "dorsal CA1 neurons are vulnerable to elevated DLK expression, while neurons in CA3 appear largely resistant to DLK overexpression". This statement assumes that DLK expression levels start at a similar baseline among regions. Do the authors have any such data? Ideally, they should show whether DLK expression and p-c-Jun (as a marker of downstream DLK signaling) are the same or different across regions in both WT and overexpression mice. For example, what are the DLK/p-c-Jun expression levels in regions other than CA1 in Supplementary Figures 2-3 and how do they compare with each other? Normalization to baseline for each region does not allow such a comparison. Also, in Supplementary Figure 6, analyses and comparisons between regions are done at a time point when degeneration has already started. Ideally, these should be done at P10.

      We thank the reviewer for raising these points. In the revised manuscript we have included protein expression analysis of DLK (Fig S3), c-Jun, and p-c-Jun at P10 (Fig. S7).

      We provided a quantification of DLK immunostaining intensity in CA1 and CA3 in Fig.S3D,E and find roughly comparable levels between regions.

      Pg. 6: ‘Additional analysis at the mRNA level (supplemental excel, File S2. WT vs DLK(iOE) DEGs) and at the protein level (Fig.S8E) suggest that the increase in DLK abundance was around 3 times the control level. The localization patterns of DLK protein appeared to vary depending on region of hippocampus and age of animals in both control and Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice (Fig.S3C).’

      We provided our quantifications without normalization to baseline in each region for c-Jun and p-c-Jun, and revised the text accordingly:

      Pg. 9-10: ‘In control mice, glutamatergic neurons in CA1 had low but detectable c-Jun immunostaining at P10 and P15, but reduced intensity at P60; those in CA3 showed an overall low level of c-Jun immunostaining at P10, P15 and P60; and those in DG showed a low level of c-Jun immunostaining at P10 and P15, and an increased intensity at P60 (Fig.S7A,C,E). In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice at P10 when no discernable neuron degeneration was seen in any regions of hippocampus, only CA3 neurons showed a significant increase of immunostaining intensity of c-Jun, compared to control (Fig.S7A). In P15 mice, we observed further increased immunostaining intensity of c-Jun in CA1, CA3, and DG, with the strongest increase (~4-fold) in CA1, compared to age-matched control mice (Fig.S7C). The overall increased c-Jun staining is consistent with RiboTag analysis’.

      Pg. 10: ‘In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice, we observed increased p-c-Jun positive nuclei in CA1 at P10, and strong increase in CA1 (~10-fold), CA3 (~6-fold), and DG (~8-fold) at P15 (Fig.S7B,D).

      Illustration of proposed selective changes in hippocampal sector volume needs to be very carefully prepared in view of the substantial claims on selective vulnerability. In 2A under P15 and especially P60, it is difficult to see the difference - this needs lower magnification and a lot of care that anteroposterior levels are identical because hippocampal sector anatomy and volumes of sectors vary from level to level. One wonders if the cortex shrinks, too. This is important.

      Thank you for raising the point. We have provided images to view the anteroposterior level in Fig.S2A-C. We have noticed cortex in DLK(OE) mice to become thinner, along with expansion of ventricles in some animals at later timepoints (Fig.S2C).

      One cannot be sure that there is selective death of hippocampal sectors with DLK overexpression versus, say, rearrangement of hippocampal architecture. One may need stereological analysis, otherwise this substantial claim appears overinterpreted.

      We appreciate the comment.

      In the revised manuscript, we included a new supplemental figure (Fig. S2) showing lower magnification images of coronal sections, and used cautionary wording, such as ‘CA3 is less vulnerable, compared to CA1’, to minimize the impression of over-interpretation.  By NeuN staining, at P10, P15, P60, we did not observe detectable difference in overall hippocampus architecture, apart from noted cell death of CA1 and DG and associated thinning of each of the layers. At 46 weeks, some animals showed differences in the overall shape of dorsal hippocampus, though this appeared to reflect a disproportionately large CA3 region compared to other regions (Fig S2). Increased GFAP staining (Fig.S5A-C) was detected in CA1 but not in CA3, and microglia by IBA1 staining (Fig.S5E) also displayed less reactivity in CA3, compared to CA1. Thus, based on NeuN staining, GFAP staining, IBA1 staining and analysis of the differentially regulated genes, we infer that the effect of DLK(iOE) in CA1 is different than the effect on CA3.

      Is the GFAP excess reflective of neuroinflammation? What do microglial markers show? The presence of neuroinflammation does not bode well with apoptosis. Speaking of which, TUNEL in one cell in Supplementary Figure 4E is not strong evidence of a more widespread apoptotic event in CA1.

      We have included staining data for the microglia marker IBA1. Both GFAP and IBA1 showed evidence of reactivity particularly in the CA1 region (S5A-E), supporting the differential vulnerability in different regions, though whether cell death is primarily due to apoptosis is unclear.

      We agree that our data of sparse TUNEL staining at P15 (Fig S5F,G) do not rule out whether other mechanisms of cell death may also occur.  We have included this in our limitations (pg.20) “While we find evidence for apoptosis, other forms of cell death may also occur.”

      In several places in the paper (as illustrated in Figure 4B, Supplementary Figure 2B, etc.): the unit of biological observation in animal models is typically not a cell, but an organism, in which averaged measures are generated. This is a significant methodological problem because it is not easy to sample neurons without involving stereological methods. With the approach taken here, there is a risk that significance may be overblown.

      We appreciate the reviewer’s point. We used same region for quantification of RNAscope, genotype-blind when possible. We revised the graphs to show mean values for individual mice in Fig.4B, 4C, and Fig.S3B (previously Fig.S2B).

      Other Comments and Questions:

      Supplementary Figure 9: The authors state that data points are shown for individual ROIs - ideally, they should also show averages for biological replicates. Can the authors confirm that statistical analyses are based on biological replicates (mice) and not ROIs?

      We have revised the graphs to show averages from individual mice in Fig.5B-D, F5E-F (previously Fig.S9G-I), Fig.5H-J, and Fig.5K-L (previously Fig.S9J-L)  and Fig.S10B,C,E,F (previously Fig.S9B,C, E,F). The statistical analyses are based on biological replicates of mice.

      For in vitro experiments, what is the effect of DLK overexpression on neuronal viability and density? Could these variables confound effects on synaptogenesis/synapse maturation?

      As described in the Methods, we made hippocampal neuron cultures from P1 pups of the following crosses:

      For control: Vglut1<sup>Cre/+</sup> X Rosa26<sup>tdT/+</sup> 

      For DLKcKO: Vglut1<sup>Cre/+</sup>;DLK(cKO)<sup>fl/fl</sup>  X Vglut1<sup>Cre/+</sup>;DLK(cKO)<sup>fl/fl</sup>;Rosa26<sup>tdT/+</sup> 

      For DLKiOE: H11-DLK<sup>iOE/iOE</sup> X Vglut1<sup>Cre/+</sup>;Rosa26<sup>tdT/+</sup> 

      Dissociated cells from a given litter were pooled into the same culture. Because there were different proportions of neurons with our genotype of interest in each culture, it is not simple to know whether DLK was causing significant cell death.

      On pg 13, we stated our observation:

      ‘We did not notice an obvious effect of DLK(iOE) or DLK(cKO) on neuron density in cultures at DIV2. To assess neuronal type distribution in our cultures, we immunostained DIV14 neurons with antibodies for Satb2, as a CA1 marker (Nielsen et al., 2010), and Prox1, as a marker of DG neurons (Iwano et al., 2012). We did not observe significant differences in the proportion of cells labeled with each marker in DLK(cKO) or DLK(iOE) cultures (Fig.S13E). These data are consistent with the idea that DLK signaling does not have a strong role in neuron-type specification both in vivo and in vitro’.

      We cannot rule out whether variable factors in our cultures may confound effects on synaptogenesis/synapse maturation, and would hope future studies will shed clarity.

      Correlations between c-jun expression and phosphorylation are extremely important and need to be carefully and convincingly documented. I am a bit concerned about Supplementary Figure 6 images, especially 6B-CA1 (no difference between control and KO, too small images) and 6D (no p-c-Jun expression at all anywhere in the hippocampus at P15?).

      At P10, P15, and P60 we stained for p-c-Jun using the Rabbit monoclonal p-c-Jun (Ser73) (D47G9) antibody from Cell Signaling (cat# 3270) at a 1:200 dilution and imaged using an LSM800 confocal microscope with a 20x objective. We observed p-c-Jun to be quite low generally in control animals. We have replaced the images in Fig.S7F (previously S6D), and adjusted the brightness/contrast to enable better visualization of the low signal in Fig.S7B,D,F (previously Fig.S6B,D).

      We revised our text to present the data carefully as stated above:

      Pg. 9-10: ‘In control mice, glutamatergic neurons in CA1 had low but detectable c-Jun immunostaining at P10 and P15, but reduced intensity at P60; those in CA3 showed an overall low level of c-Jun immunostaining at P10, P15 and P60; and those in DG showed a low level of c-Jun immunostaining at P10 and P15, and an increased intensity at P60 (Fig.S7A,C,E). In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice at P10 when no discernable neuron degeneration was seen in any regions of hippocampus, only CA3 neurons showed a significant increase of immunostaining intensity of c-Jun, compared to control (Fig.S7A). In P15 mice, we observed further increased immunostaining intensity of c-Jun in CA1, CA3, and DG, with the strongest increase (~4-fold) in CA1, compared to age-matched control mice (Fig.S7C). The overall increased c-Jun staining is consistent with RiboTag analysis’.

      Pg. 10: ‘In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice, we observed increased p-c-Jun positive nuclei in CA1 at P10, and strong increase in CA1 (~10-fold), CA3 (~6-fold), and DG (~8-fold) at P15 (Fig.S7B,D).

      Recommendations for the authors:

      Several major and minor reservations were raised. The major issues are the need for more information about the over-expression of DLK and a need to extrapolate to an in vivo condition with DLK. A considerable amount of useful information is presented with some very nicely done experiments but it is not yet a coherent or integrated story. The lack of impact of DLK overexpression in some neurons is perhaps the most impactful observation of the study and would be great to have more information around the differential transcriptional/signaling response in these cell types. There is also a need for more experimental details and to address several questions about the mouse genetic and translatome analysis. They are valid concerns that require attention by the authors.

      We thank the editors and reviewers for their thoughtful evaluation and suggestions.  We hope that the editors and reviewers find that the new data and text changes in our revised manuscript, along with above point-to-point response, have addressed the concerns and strengthened our findings.

      Minor points:

      (1)The authors state that deletion of DLK has no effect on CA1 at 1yr, however, the image of CA1 in Figure S1D shows substantially fewer NeuN+ neurons. Is this a representative field of view?

      We have re-examined images, and observed no effect on hippocampal morphology at 1 yr. We now included representative images in the revised Fig S1D.

      (2) Is the DLK protein section staining in Figure 2C a real signal? The staining looks like speckles and is purely somatic. Axonal staining is widely expected based on the literature and the authors' own work. There should be a specificity control.

      To our knowledge, axonal staining of DLK reported in the literature is mostly based on cultured DRG neurons. In addition to the reported axonal localization, DLK is present in the cell soma, near the golgi (Hirai et al., 2002), and in the post-synaptic density (Pozniak et al., 2013).

      In the revised manuscript, we addressed this point by including controls with no primary antibody, and using an antibody against the closely related kinase, LZK. These additional data are shown in (Fig.S3C,D) (previously Fig.S2C), supporting that DLK protein staining represents real signal.  At P10 and P15, DLK immunostaining around CA3 showed axonal staining of the mossy fibers, as well as in the soma and dendritic layers (Fig.S3C,D). A similar pattern was also seen in primary cultured neurons (Fig 6A).

      (3) The protein expression of DLK in the transgenic overexpressor (Figure S7C) looks, to the resolution of this blot, to be at least 50kD heavier than 'WT' DLK. Can the authors explain this discrepancy?

      The Cre-induced DLK(iOE) transgene has T2A and tdTomato in-frame to C-terminus of DLK. It is known that T2A ‘self-cleavage’ is often incomplete. DLK-T2A-tdTomato would be about 50 kD bigger than WT DLK. We now include the transgene design in revised Fig S1D, and also stated in figure legend of Fig.S8C (previously S7C) that ‘Larger molecular weight band of DLK in Vglut1<sup>Cre/+</sup>;H11-DLKiOE/+ would match the predicted molecular weight of DLK-T2A-tdTomato if T2A-peptide induced ‘self-cleavage’ due to ribosomal skipping is ineffective (Fig.S1D).’

      (4) Expression changes in DLK affect various aspects of neurites in CA1 cultures (Figure 6), and changes in DLK also modestly affect STMN4 (and 2, perhaps indirectly) levels (Figure S7C), but there is no indication that DLK acts via STMN4 to cause these changes. It is not clear what to make of these data. Of note, Stmn4 levels change in response to DLK in CA3, without DLK affecting cell death in this region.

      We appreciate and agree with the comment. Other studies (Asghari Adib et al., 2024; DeVault et al., 2024; Hu et al., 2019; Larhammar et al., 2017; Le Pichon et al., 2017; Shin et al., 2019; Watkins et al., 2013) reported expression changes in Stmn4 mRNAs in other cell types and cellular contexts, which appeared to depend on DLK. Hippocampal neurons express multiple Stmns (Fig.S8A). While we present our analysis on the effects of DLK dosage on Stmn4, and also Stmn2, we do not think that DLK-induced changes of Stmn4 expression per se is a major factor underlying CA1 cell death vs CA3 survival.

      In the revised manuscript, we addressed this point in ‘Limitation of our study’ (pg 20):

      ‘Additional experiments will be needed to elucidate in vivo roles of STMN4 and its interaction with other STMNs’.

    1. eLife Assessment

      This study presents important findings on the function of enteric glia expressing proteolipid protein 1 (PLP1+ glia). The evidence supporting the claims of the authors is solid, although the inclusion of additional data showing the mechanisms by which PLP1+ enteric glia acts on Paneth cells would have strengthened the study. The work will be of interest to colleagues studying intestinal biology.

    2. Reviewer #1 (Public review):

      The role of enteric glial cells in regulating intestinal mucosal functions at steady state has been a matter of debate in recent years. Enteric glial cell heterogeneity and related methodological differences likely underlie the contrasting findings obtained by different laboratories. Here, Prochera and colleagues used Plp1-CreERT2 driver mice to deplete the vast majority of enteric glia from the gut, and performed an elegant set of transcriptomic, microscopic and biochemical essays to examine the impact of enteric glia loss. It was found that enteric glia depletion has very limited effects on the transcriptome of gut cells 11 days after tamoxifen treatment (used to induce Diphtheria Toxin A expression in the majority of enteric glia including those present in the mucosa), and by extension - more specifically, has only minimal impact on cells of the intestinal mucosa. Interestingly, in the colon (where Paneth cells are not present) they did observe transcriptomic changes related to Paneth cell biology. Although no overt gene expression alterations were found in the small intestine - also not in Paneth cells - morphological, ultrastructural and functional changes were detected in the Paneth cells of enteric glia-depleted mice. In addition, and likely related to impaired Paneth cell secretory activity, enteric glia-depleted mice also show alterations in intestinal microbiota composition. This is an excellent study that convincingly demonstrates a role for enteric glia in supporting Paneth cells of the intestinal mucosa, suggesting that enteric glial cells shape host-microbiome interactions via the regulation of Paneth cell homeostasis.

    3. Reviewer #2 (Public review):

      This is an excellent and timely study from the Rao lab investigating the interactions of enteric glia with the intestinal epithelium. Two early studies in the late 90's and early 2000's had previously suggested that enteric glia play a pivotal role in control of the intestinal epithelial barrier, as their ablation using mouse models resulted in severe and fatal intestinal inflammation. However, it was later identified that these inflammatory effects could have been an indirect product of the transgenic mouse models used, rather than due to the depletion of enteric glia. In previous studies from this lab, the authors had identified expression of PLP1 in enteric glia, and its use in CRE driver lines to label and ablate enteric glia.

      In the current paper, the authors carefully examine the role of enteric glia by first identifying that PLP1-creERT2 is the most useful driver to direct enteric glial ablation, in terms of the quantity of glial cells targeted, their proximity to the intestinal epithelium, and the relevance for human studies (GFAP expression is rather limited in human samples in comparison). They examined gene expression changes in different regions of the intestine using bulk RNA-seq following ablation of enteric glia by driving expression of diptheria toxin A (PLP1-creERT2;Rosa26-DTA). Alterations in gene expression were observed in different regions of the gut, with specific effects in different regions. Interestingly, while there were gene expression changes in the epithelium, there were limited changes to the proportions of different epithelial cell types identified using immunohistochemistry in control vs glial-ablated mice. The authors then focused on investigation of Paneth cells in the ileum, identifying changes in the ultrastructural morphology and lysozyme activity. In addition, they identified alterations in gut microbiome diversity. As Paneth cells secrete antimicrobial peptides, the authors conclude that the changes in gut microbiome are due to enteric glia-mediated impacts on Paneth cell activity.

      Overall, the study is excellent and delves into the different possible mechanisms of action, including investigation of changes in enteric cholinergic neurons innervating the intestinal crypts. The use of different CRE-drivers to target enteric glial cells has led to varying results in the past, and the authors should be commended on how they address this in the Discussion.

      Comments on the latest version:

      Thanks to the authors for addressing my concerns. The additional stratification of male vs female microbiome data was very helpful.

    1. eLife Assessment

      This study on mouse Ly49 receptors expressed on natural killer (NK) cells shows that Ly49A, in the presence of the corresponding MHC Class I allele, can lead to NK cell licensing, thereby providing valuable insights into the mechanisms of NK cell modulation by Ly49 receptors. The work may have significant implications for studies of human Killer-cell immunoglobulin-like receptors (KIR) expressing and other NK cells. Overall, the study was well-developed with convincing evidence.

    2. Reviewer #1 (Public review):

      Summary:

      The article by Piersma et al. aims to reduce the complex process of NK cell licensing to the action of a single inhibitory receptor for MHC class I. This is achieved using a mouse strain lacking all of the Ly49 receptors expressed by NK cells and inserting the Ly49a gene into the Ncr1 locus, leading to expression on all the majority of NK cells.

      Strengths:

      The mouse model used represents a precise deletion of all NK-expressed genes within the Ly49 cluster. Re-introduction of the Ly49a gene into the Ncr1 locus allows expression by most NK cells. Convincing effects of Ly49a expression on in vitro activation and in vivo killing assay are shown.

      Weaknesses:

      The choice of Ly49a provides a clear picture of H-2Dd recognition by this Ly49. It would be valuable to perform additional studies investigating Ly49c and Ly49i receptors for H-2b. This is of interest because there are reports indicating that Ly49c may not be a functional receptor in B6 mice due to strong cis interactions. Investigation of the Ly49c and Ly49i receptors in this model would be the basis of future studies that are beyond the scope of the current report.

      This work generates an excellent mouse model for the study of NK cell licensing by inhibitory Ly49s that will be useful for the community. It provides a platform whereby the functional activity of a single Ly49 can be assessed.

      Comments on revisions: No additional concerns

    3. Reviewer #2 (Public review):

      Piersma et al. continue to work on deciphering the role and function of Ly49 NK cell receptors. This manuscript shows that a single inhibitory Ly49 receptor is sufficient to license NK cells and eliminate MHC-I-deficient target cells in mice. In short, they refined the mouse model ∆Ly49-1 (Parikh et al., 2020) into the Ly49KO model in which all Ly49 genes are disrupted. Using this model, they confirmed that NK cells from Ly49KO mice cannot be licensed, produce lower levels of IFN-gamma, and cannot reject MHC-I-deficient cells. To study the effect of a single Ly49 receptor in the function of NK cells, the authors backcrossed Ly49KO mice to H-2Dd transgenic KODO (D8-KODO) Ly49A knock-in mice in which a single inhibitory Ly49A receptor that recognizes H-2Dd ligands is expressed. By doing so, they demonstrate that a single inhibitory Ly49 receptor expressed by all NK cells is sufficient for licensing and missing-self killing.

      While the results of the study are largely consistent with the conclusions, it is important to address some discrepancies. For instance, in the title of Figure 1, the authors state that NK cells in Ly49KO mice compared to WT mice have a less mature phenotype , which is not consistent with the corresponding text in the Results section (lines 170-171) that states there is no difference in maturation. These differences are not evident in Figure 1, panel D. It is crucial to acknowledge these inconsistencies to ensure a comprehensive understanding of the research findings.

      In the legend of Figure 2. the text related to panel C indicates the use of dyes to label the splenocytes, and CFSE, CTV, and CTFR were mentioned. However, only CTV and CTFR are shown on the plots and mentioned in the corresponding text in the Results section. Similarly, in the legend of Figure 4, which is related to panel C, the authors write that splenocytes were differentially labeled with CFSE and CTV as indicated; however, in Figure 4, C and the Results section text, there is no mention of CFSE.

      The authors should clarify why they assume that KLRG1 expression is influenced by the expression of inhibitory Ly49 receptors and not by manipulations on chromosome 6, where the genes for both KLRG1 and Ly49 receptors are located. However, a better explanation for the possible influence of other inhibitory NK cell receptors still needs to be included. In the study by Zhang et al. (doi: 10.1038/s41467-019-13032-5 the authors showed the synergized regulation of NK cell education by the NKG2A receptor and the specific Ly49 family members. Although in this study, Piersma and colleagues show the control of MHC-I deficient cells by Ly49A+ NKG2A-NK cells in Figure 4., this receptor is not mentioned in the Results or in the Discussion section, so its role in this story needs to be clarified. Therefore, the reader would benefit from more information regarding NKG2A receptor and NKG2A+/- populations in their results.

      Comments on revisions: The authors have successfully answered all my questions and edited the manuscript accordingly.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Piersma et al. successfully generated a mouse model with all Ly49 genes knocked out, resulting in the complete absence of Ly49 receptor expression on the cell surface. The absence of Ly49 expression led to the loss of NK cell education/licensing and consequently, a failure in responsiveness against missing-self target cells. The authors demonstrate the restoration of NK cell licensing by knocking in a single Ly49 gene, Ly49A, in a mouse expressing the H-2Dd ligand for this receptor, which is a novel and important finding.

      Strengths:

      The authors established a novel mouse model enabling them to have a clean and thorough study on the function of Ly49 on NK cell licensing. Also, by knock in a single Ly49, they were able to investigate the function of a given Ly49 receptor excluding the "contamination" of co-expression any other Ly49 genes. The experiment designing and data interpretation were logically clear and the evidence was solid.

      Weaknesses:

      The mouse model was somehow genetically similar to a previous study. The experimental work and findings are partially overlapping with the previous work by Zhang et al. (2019), who also performed knockout of the entire Ly49 locus in mice and demonstrated that loss of NK responsiveness was due to the removal of inhibitory, and not activating Ly49 genes.

      Potential achievements and discussions: The mouse model developed by the authors holds great potential for advancing NK cell functional studies, particularly regarding the regulation of NK cell functions through receptor-ligand interactions. Moreover, it provides a valuable tool for investigating NK cell education and the development of checkpoint inhibitors. These applications could significantly contribute to the broader research efforts in cancer therapy utilizing NK cells.

      Comments on revisions: The authors have successfully addressed all the concerns raised in my previous feedback. They have significantly improved the logical structure, making it clearer and more coherent. Additionally, they have ensured consistency in the use of specific terminology throughout the manuscript. The substantial revisions and re-writing efforts are commendable and have greatly enhanced the overall quality of the manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The article by Piersma et al. aims to reduce the complex process of NK cell licensing to the action of a single inhibitory receptor for MHC class I. This is achieved using a mouse strain lacking all of the Ly49 receptors expressed by NK cells and inserting the Ly49a gene into the Ncr1 locus, leading to expression on the majority of NK cells.

      Strengths:

      The mouse model used represents a precise deletion of all NK-expressed genes within the Ly49 cluster. The re-introduction of the Ly49a gene into the Ncr1 locus allows expression by most NK cells. Convincing effects of Ly49a expression on in vitro activation and in vivo killing assay are shown.

      Weaknesses:

      The choice of Ly49a provides a clear picture of H-2D<sup>d</sup> recognition by this Ly49. It would be valuable to perform additional studies investigating Ly49c and Ly49i receptors for H-2b. This is of interest because there are reports indicating that Ly49c may not be a functional receptor in B6 mice due to strong cis interactions.

      We agree with the reviewer that it will be important to extend our findings to H-2b haplotypes with individual cognate Ly49 receptors (Ly49C and Ly49I). While these experiments are subject of our ongoing studies, they are beyond the scope of the current manuscript considering the significant time, effort and cost to generate these new Ly49C and Ly49I knockin mice.

      This work generates an excellent mouse model for the study of NK cell licensing by inhibitory Ly49s that will be useful for the community. It provides a platform whereby the functional activity of a single Ly49 can be assessed.

      Reviewer #2 (Public review):

      Piersma et al. continue to work on deciphering the role and function of Ly49 NK cell receptors. This manuscript shows that a single inhibitory Ly49 receptor is sufficient to license NK cells and eliminate MHC-I-deficient target cells in mice. In short, they refined the mouse model ∆Ly49-1 (Parikh et al., 2020) into the Ly49KO model in which all Ly49 genes are disrupted. Using this model, they confirmed that NK cells from Ly49KO mice cannot be licensed, produce lower levels of IFN-gamma, and cannot reject MHC-I-deficient cells. To study the effect of a single Ly49 receptor in the function of NK cells, the authors backcrossed Ly49KO mice to H-2D<sup>d</sup> transgenic KODO (D8-KODO) Ly49A knock-in mice in which a single inhibitory Ly49A receptor that recognizes H-2D<sup>d</sup> ligands is expressed. By doing so, they demonstrate that a single inhibitory Ly49 receptor expressed by all NK cells is sufficient for licensing and missing-self killing.

      While the results of the study are largely consistent with the conclusions, it is important to address some discrepancies. For instance, in the title of Figure 1, the authors state that NK cells in Ly49KO mice compared to WT mice have a less mature phenotype , which is not consistent with the corresponding text in the Results section (lines 170-171) that states there is no difference in maturation. These differences are not evident in Figure 1, panel D. It is crucial to acknowledge these inconsistencies to ensure a comprehensive understanding of the research findings.

      We thank the reviewer for pointing this out. We have corrected the figure legend title to: “Mice generated to lack all NK-related Ly49 molecules using CRISPR have NK cells that display alterations in select surface molecules.”

      In the legend of Figure 2. the text related to panel C indicates the use of dyes to label the splenocytes, and CFSE, CTV, and CTFR were mentioned. However, only CTV and CTFR are shown on the plots and mentioned in the corresponding text in the Results section. Similarly, in the legend of Figure 4, which is related to panel C, the authors write that splenocytes were differentially labeled with CFSE and CTV as indicated; however, in Figure 4, C and the Results section text, there is no mention of CFSE.

      We thank the reviewer to point out these inconsistencies. We did label target cells with CFSE to distinguish them from host cells, to clarify we have done the following:

      We have removed CFSE from figure legends of Figure 2 and 4.

      We included the following on CFSE labeling in the Materials and Methods section: “Target splenocytes were additionally labeled with CFSE to identify transferred target splenocytes from host cells.”

      The authors should clarify why they assume that KLRG1 expression is influenced by the expression of inhibitory Ly49 receptors and not by manipulations on chromosome 6, where the genes for both KLRG1 and Ly49 receptors are located.

      The effect on KLRG1 expression in phenocopied in the Ly49A KI mice (on a Ly49 KO background). The Ly49A KI allele is encoded by the Ncr1 locus, which is located on chromosome 7 and not by chromosome 6 where KLRG1 is located, thus excluding involvement of cis-regulatory elements encoded by the Ly49 locus on chromosome 6. 

      We have clarified this in the discussion section (lines 350-358):

      “The Ly49 gene family as well as Klrg1 is located within the NKC on chromosome 6 (Yokoyama and Plougastel, 2003) ….  expression of only Ly49A, encoded in the Ncr1 locus located on chromosome 7, in Ly49KO mice on a H-2D<sup>d</sup> background restored KLRG1 expression”

      However, a better explanation for the possible influence of other inhibitory NK cell receptors still needs to be included. In the study by Zhang et al. (doi: 10.1038/s41467-019-13032-5 the authors showed the synergized regulation of NK cell education by the NKG2A receptor and the specific Ly49 family members. Although in this study, Piersma and colleagues show the control of MHC-I deficient cells by Ly49A+ NKG2A-NK cells in Figure 4., this receptor is not mentioned in the Results or in the Discussion section, so its role in this story needs to be clarified. Therefore, the reader would benefit from more information regarding NKG2A receptor and NKG2A+/- populations in their results.

      We agree with the reviewer that it is important to describe our results in the context of other inhibitory receptors. To clarify the role of NKG2A and potentially other inhibitory receptors we have made the following improvements to our manuscript:

      We discuss the role of NKG2A in the discussion section, which now include (lines 259-266):

      “While our results did not interrogate licensing by inhibitory receptors outside of the Ly49 receptor family, such as has been reported for NKG2A (Anfossi et al., 2006; Zhang et al., 2019), they do demonstrate that expression of Ly49A without other Ly49 family members can mediate NK cell licensing. Moreover, we found that Ly49 receptors are required and sufficient for missing-self rejection under steady-state conditions. However, these observations do not rule out involvement of other inhibitory receptors under specific inflammatory conditions. For example, NKG2A contributes to rejection of missing-self targets in poly(I:C)-treated mice (Zhang et al., 2019).”

      We also added the following to the result section (lines 179-182):

      NKG2A has been implicated in NK cell licensing by the non-classical MHC-I molecule Qa1 (Anfossi et al., 2006), to eliminate potential confounding effects by this interaction, effector functions of NKG2A- NK cells were evaluated as described before (Bern et al., 2017).

      Reviewer #3 (Public review):

      Summary:

      In this study, Piersma et al. successfully generated a mouse model with all Ly49n et al., 2017 genes knocked out, resulting in the complete absence of Ly49 receptor expression on the cell surface. The absence of Ly49 expression led to the loss of NK cell education/licensing and consequently, a failure in responsiveness against missing-self target cells. The experimental work and findings are partially overlapping with the previous work by Zhang et al. (2019), who also performed knockout of the entire Ly49 locus in mice and demonstrated that loss of NK responsiveness was due to the removal of inhibitory, and not activating Ly49 genes. The authors demonstrate the restoration of NK cell licensing by knocking in a single Ly49 gene, Ly49A, in a mouse expressing the H-2D<sup>d</sup> ligand for this receptor, which is a novel and important finding.

      Strengths:

      The authors established a novel mouse model enabling them to have a clean and thorough study on the function of Ly49 on NK cell licensing. Also, by knocking in a single Ly49, they were able to investigate the function of a given Ly49 receptor excluding the "contamination" of co-expression of any other Ly49 genes. Their idea and method were novel though the mouse model was somehow genetically similar to a previous study. The experiment design and data interpretation were logically clear and the evidence was solid.

      Weaknesses:

      The paper is very poorly written and confusing. The authors should be more accurate in the usage of terminology, provide more details on experimental procedures, and revise much of the text to improve clarity and coherence. A thorough revision aiming to clarify the paper would be helpful.

      We regret that the manuscript was confusing to the reviewer. We have made thorough revisions to the different sections, which we hope will improve the clarity of the manuscript.

      We have made changes to all sections of the manuscript, including the title. These revisions include improved clarity on description of NK cell licensing and consistent usage throughout the manuscript, per the reviewer recommendations. We hope that all our improvements help the clarity of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I was confused by lines 262-270 in the discussion. The data from Hanke et al. is presented as contradictory to the observation that Ly49s bind more efficiently to H2-Kb than -Db, but they showed that Ly49c/i did not bind Kb-deficient cells, supporting the preferred binding to Kb.

      We have clarified this issue and the paragraph now reads: “This is further supported by early studies using Ly49 transfectants binding to Con A blasts showing that Ly49C and Ly49I can bind to H-2D<sup>b</sup>-deficient but not H-2K<sup>b</sup>-deficient cells (Hanke et al., 1999), despite the caveat of testing binding to cells overexpressing Ly49s in these studies.”

      Reviewer #2 (Recommendations for the authors):

      The authors' conclusion that one type of inhibitory Ly49 receptor expressed on NK cells is sufficient for successful licensing and rejection of missing self-cells is a significant step forward. However, it would be beneficial to complement this with additional data. For instance, exploring the role of a single inhibitory Ly49 receptor responsible for licensing in a mouse model with a different haplotype (e.g. Ly49C or Ly49I on H-2b MHC I haplotype in C57BL/6J mice) could provide valuable insights and open new avenues for research in the field.

      We agree with the reviewer that it will be important to extend our findings to additional MHC-I haplotypes with single cognate Ly49 receptors. While these experiments are subject of our ongoing studies, they are beyond the scope of the current manuscript considering the significant effort, time, and cost to generate these new Ly49C and Ly49I knockin mice.

      Reviewer #3 (Recommendations for the authors):

      Specific issues that should be addressed are as follows:

      (1) The title of the paper: "Expression of a single inhibitory Ly49 receptor is sufficient to license NK cells for effector functions" is ambiguous. When I first read the title, I thought the authors meant that only a single Ly49 molecule on the NK cell surface was necessary to induce licensing. It might be better to replace "single inhibitory receptor" with "single member of Ly49 receptor family".

      We have changed the title to: “Expression of a single inhibitory member of the Ly49 receptor family is sufficient to license NK cells for effector functions”

      (2) In the abstract, introduction, and results, the authors distinguish "licensing" and "rejection of missing-self targets" as two distinct phenomena. An example includes Abstract, lines 51-53: "Herein, we showed mice lacking expression of all Ly49s were unable to reject missing-self target cells in vivo, were defective in NK cell licensing, and displayed lower KLRG1 on the surface of NK cells". Similarly, the title of the second subsection of the Results states: "Ly49-deficient NK cells are defective in licensing and rejection of cognate MHC-I deficient target cells" (line 176). In these instances, it seems that by "licensing", they mean only response to plate-bound anti-NK1.1 stimulation and not a response to missing-self targets. Alternatively, in the first paragraph of the Discussion, it sounds as if licensing includes both anti-NK1.1 and missing-self responses (lines 258-260): "...NK cells were fully licensed in terms of their functional phenotype, including the capacity to be activated by an activation receptor in vitro and efficient rejection of MHC-I deficient target cells in vivo". Please define the terms and use the terms consistently throughout the paper.

      We were the first to describe the term licensing and have defined this as acquisition of NK cell functional competence by self-MHC molecules (Kim et al., 2005), which is characterized by increased NK cell effector functions to activating signals. Thus, licensed NK cells are prevented from attacking normal MHC-I<sup>+</sup> cells by the same self-MHC-I-specific receptor that conferred licensing, while unlicensed NK cells without appropriate Ly49 receptors are functionally incompetent.

      To clarify we made changes throughout the manuscript including the following:

      Lines 91-101:

      “In addition to effector function in missing-self, Ly49 receptors that recognize their cognate MHC-I ligands are involved in licensing or education of NK cells to acquire functional competence. NK cell licensing is characterized by potent effector functions including IFNγ production and degranulation in response to activation receptor stimulation (Elliott et al., 2010; Kim et al., 2005). Like missing-self recognition, inhibitory Ly49s require SHP-1 for NK cell licensing which interacts with the ITIM-motif encoded in the cytosolic tail of inhibitory Ly49s (Bern et al., 2017; Kim et al., 2005; Viant et al., 2014). Moreover, lower expression of SHP-1, particularly within the immunological synapse, is associated with licensed NK cells (Schmied et al., 2023; Wu et al., 2021). Thus, inhibitory Ly49s have a second function that licenses NK cells to self-MHC-I thereby generating functionally competent NK cells but it has not been possible to exclude contributions from other co-expressed Ly49s.”

      Lines 268-271 (previously 258-260):

      “Yet the NK cells were fully licensed in terms of IFNγ production and degranulation in vitro and efficiently rejected MHC-I deficient target cells in vivo. Thus, a single Ly49 receptor is capable to confer the licensed phenotype and missing-self rejection in vitro and in vivo.”

      Lines 309-312:

      “In conclusion, these data show that expression of a single inhibitory Ly49 receptor is necessary and sufficient to license NK cells and mediate missing self-rejection under steady state conditions in vivo.”

      (3) Introduction, lines 76-79. Please provide the C57BL/6 MHC-I genotype. It is difficult to follow the text here without this information. In general, please provide information to help the reader who may not be working in this precise field.

      We thank the reviewer for pointing this out. We have included this and the lines now read: “For example, in the C57BL/6 background, Ly49C and Ly49I can recognize H-2<sup>b</sup> MHC-I molecules that include H-2K<sup>b</sup> and H-2D<sup>b</sup>, while Ly49A and Ly49G cannot recognize H-2<sup>b</sup> molecules and instead they recognize H-2<sup>d</sup> alleles.”

      (4) Introduction, lines 85-97. Please use commas: "...the MHC-I specificities of other Ly49s have been primarily studied with MHC tetramers containing human b2m, which is not recognized by Ly49A, on cells overexpressing Ly49s" in order to clarify the sentence.

      Commas have been added as suggested by the reviewer.

      (5) Introduction, lines 91-101. The whole paragraph starting with the following sentence does not make sense and should be re-written. "In addition to effector function in missing-self, when inhibitory Ly49 receptors recognize their cognate MHC-I ligands in vivo, they license or educate NK cells for potent effector functions including IFNγ production and degranulation in response to activation receptor stimulation".

      We regret that this paragraph was not clear to the reviewer. We have changed this paragraph to:

      “In addition to effector function in missing-self, Ly49 receptors that recognize their cognate MHC-I ligands are involved in licensing or education of NK cells to acquire functional competence. NK cell licensing is characterized by potent effector functions including IFNγ production and degranulation in response to activation receptor stimulation (Elliott et al., 2010; Kim et al., 2005). Like missing-self recognition, inhibitory Ly49s require SHP-1 for NK cell licensing which interacts with the ITIM-motif encoded in the cytosolic tail of inhibitory Ly49s (Bern et al., 2017; Kim et al., 2005; Viant et al., 2014). Moreover, lower expression of SHP-1, particularly within the immunological synapse, is associated with licensed NK cells (Schmied et al., 2023; Wu et al., 2021). Thus, inhibitory Ly49s have a second function that licenses NK cells to self-MHC-I thereby generating functionally competent NK cells but it has not been possible to exclude contributions from other co-expressed Ly49s.”

      (6) Results, line 181. Please edit: "...MHC-I-deficient H-2K<sup>b</sup> x H-2D<sup>b</sup> deficient (KODO) mice".

      This sentence now reads “... NK cells from H-2K<sup>b</sup> and H-2D<sup>b</sup> double deficient (KODO) mice”

      (7) Results, line 192. Please re-word the following phrase: "missing-self is dominated by H-2K<sup>b</sup> in the C57BL/6 background", as it is unclear. Do you mean that H-2K<sup>b</sup> is protected from lysis as opposed to H-2D<sup>b</sup>?

      We thank the reviewer for pointing this out, line 192 now reads: “..missing-self recognition in the C57BL/6 background depends on the absence of H-2K<sup>b</sup> rather than H-2D<sup>b</sup>.”

      (8) Please briefly describe the Ncr1-Ly49A knockin procedure so that the reader understands the link between NKp46 and Ly49A expression without going to the earlier paper. Also, it needs to be mentioned that Ncr1 is the gene encoding NKp46.

      Lines 201-205 now read: “To investigate the potential of a single inhibitory Ly49 receptor on mediating NK cell licensing and missing-self rejection, the Ly49KO mice were backcrossed to H-2D<sup>d</sup> transgenic KODO (D8-KODO) Ly49A KI mice that express Klra1 cDNA encoding the inhibitory Ly49A receptor in the Ncr1 locus encoding NKp46 and its cognate ligand H-2D<sup>d</sup> but not any other classical MHC-I molecules (Parikh et al., 2020).

      In the materials and Methods section, the following has been added (lines 324-326):

      “In Ly49A KI mice the stop codon of Ncr1 encoding NKp46 is replaced with a P2A peptide-cleavage site upstream of the Ly49A cDNA, while maintaining the 3’ untranslated region.”

      (9) Figure 4C, legend. There is no CFSE staining in this experiment. Please correct.

      We did label target cells with CFSE to distinguish them from host cells, to clarify we have done the following:

      We have removed CFSE from figure legends of Figure 2 and 4.

      We included the following on CFSE labeling in the Materials and Methods section (lines 377-379): “Target splenocytes were additionally labeled with CFSE to identify transferred target splenocytes from host cells.”

      (10) Discussion, lines 262-270. This paragraph sounds as if data by Hanke et al. does not agree with the data presented in the paper. On the contrary, Hanke et al. demonstrate that Ly49C and Ly49I detectably bind to H-2K<sup>b</sup>, but poorly to H-2D<sup>b</sup>, supporting observations shown in Figure 2C.

      We have clarified this issue and the paragraph now reads: “This is further supported by early studies using Ly49 transfectants binding to Con A blasts showing that Ly49C and Ly49I can bind to H-2D<sup>b</sup>-deficient but not H-2K<sup>b</sup>-deficient cells (Hanke et al., 1999), despite the caveat of testing binding to cells overexpressing Ly49s in these studies.”

    1. eLife Assessment

      Zanetti et al use convincing biophysical and cellular assays to investigate the interaction of the birnavirus VP3 protein with the early endosome lipid PI3P. The study provides valuable insights and will be of interest to virologists. In future studies, it would be interesting to demonstrate that VP3-PIP3P is a specific interaction and not a general interaction with other PIPs.

    2. Reviewer #1 (Public review):

      Summary:

      Zanetti et al use biophysical and cellular assays to investigate the interaction of the birnavirus VP3 protein with the early endosome lipid PI3P. The major novel finding is that association of the VP3 protein with an anionic lipid (PI3P) appears to be important for viral replication, as evidenced through a cellular assay on FFUs.

      Strengths:

      Support previously published claims that VP3 associates with early endosome membrane, potentially through binding to PI3P. The finding that mutating a single residue (R200) critically affects early endosome binding and that the same mutation also inhibits viral replication suggests a very important role for this binding in the viral life cycle.

      Weaknesses:

      The manuscript is relatively narrowly focused: the specifics of the bi-molecular interaction between the VP3 of an unusual avian virus and a host cell lipid (PIP3). Further, the affinity of this interaction is low and its specificity relative to other PIPs is not tested, leading to questions about whether VP3-PI3P binding is relevant.

    3. Reviewer #3 (Public review):

      Summary:

      infectious bursal disease virus (IBDV) is a birnavirus and an important avian pathogen. Interestingly, IBDV appears to be a unique dsRNA virus that uses early endosomes for RNA replication that is more common for +ssRNA viruses such as for example SARS-CoV-2.

      This work builds on previous studies showing that IBDV VP3 interacts with PIP3 during virus replication. The authors provide further biophysical evidence for the interaction and map the interacting domain on VP3.

      Strengths:

      Detailed characterization of the interaction between VP3 and PIP3 identified R200D mutation as critical for the interaction. Cryo-EM data show that VP3 leads to membrane deformation.

      Comments on revisions:

      I have no further comments. The authors have addressed my questions and concerns. I congratulate the authors on their work!

    1. Reviewer #2 (Public Review):

      When people help others is an important psychological and neuroscientific question. It has received much attention from the psychological side, but comparatively less from neuroscience. The paper translates some ideas from a social Psychology domain to neuroscience using a neuroeconomically oriented computational approach. In particular, the paper is concerned with the idea that people help others based on perceptions of merit/deservingness, but also because they require/need help. To this end, the authors conduct two experiments with an overlapping participant pool:

      (1) A social perception task in which people see images of people that have previously been rated on merit and need scales by other participants. In a blockwise fashion, people decide to whether the depicted person a) deserves help, b) needs help, and c) whether the person uses both hands (== control condition)

      (2) In an altruism task, people make costly helping decisions by deciding between giving a certain amount of money to themselves or another person. It is manipulated how much the other person needs and deserves the money.

      The authors use sound and robust computational modelling approach for both tasks using evidence accumulation models. They analyse behavioural data for both tasks, showing that the behaviour is indeed influenced, as expected, by the deservingness and the need of the shown people. Neurally, the authors use a block-wise analysis approach to find differences in activity levels across conditions of the social perception task. The authors do find large activation clusters in areas related to theory of mind. Interestingly, they also find that activity in TPJ that relates to the deservingness condition correlates with people's deservingness ratings while they do the task, but also with computational parameters related to helping others in the second task, the one that was conducted many months later. Also some behavioural parameters correlate across the two tasks, suggesting that how deserving of help others are perceived reflects a relatively stable feature that translates into concrete helping decisions later-on.

      The conclusions of the paper are overall well supported by the data.

      (1) I found that the modelling was done very thoroughly for both tasks. Overall, I had the impression that the methods are very solid with many supplementary analyses. The computational modelling is done very well.

      (2) A slight caveat, however, regarding this aspect, is that, in my view, the tasks are relatively simplistic, so that even the complex computational models do not as much as they can in the case of more complex paradigms. For example, the bias term in the model seems to correspond to the mean response rate in a very direct way (please correct me if I am wrong).

      (3) Related to the simple tasks: The fMRI data is analysed in a simple block-fashion. This is in my view not appropriate to discern the more subtle neural substrates of merit/need-based decision making or person perception. Correspondingly, the neural activation patterns (merit > control, need > control) are relatively broad and unspecific. They do not seem to differ in the classic theory of mind regions, that are the focus of the analyses.

      (4) However, the relationship between neural signal and behavioural merit sensitivity in TPJ is noteworthy.

      (5) The latter is even more the case, as the neural signal and aspects of the behaviour are correlated across subjects with the second task that is conducted much later. Such a correlation is very impressive and suggests that the tasks are sensitive for important individual differences in helping perception/behaviour.

      (6) That being said, the number of participants in the latter analyses are at the lower end of the number of participants that are these days used for across-participant correlations.

    2. Reviewer #3 (Public Review):

      Summary:

      The paper aims at providing a neurocomputational account on how social perception translates in prosocial behaviors. Participants first completed a novel social perception task during fMRI scanning, in which were asked to judge the merit or need of people depicted in different situations. Second , a separate altruistic choice task was used to examine how the perception of merit and need influences the weights people place on themselves, others and fairness when deciding to provide help. Finally, a link between perception and action was drawn in those participants who completed both tasks.

      Strengths:

      The paper is overall very well written and presented, leaving the reader at ease when describing complex methods and results. The approach used by the author is very compelling, as it combines computational modeling of behavior and neuroimaging data analyses. Despite not being able to comment on the computational model, I find the approach used (to disentangle sensitivity and biases, for merit and need) very well described and derived from previous theoretical work. Results are also clearly described and interpreted.

      Weaknesses:

      In the social perception task, merit and need are evaluated by means of very different cues that rely on different cognitive processes (more abstract thinking for merit than need). Despite this limitation of the task, the authors were able to argue convincingly in the revised version about the solidity of their findings. Sample size is quite small for study 2, nevertheless the results provide convincing evidence.

    3. eLife assessment

      These important findings stand out from other similar studies via some convincing demonstration of behavioural and neural relationships between two helping tasks – one focusing more on social perception, one more on its influence on social behaviour – that were performed more than 300 days apart. The claims however would be stronger with a larger sample size.

    1. eLife Assessment

      This study presents a useful finding that targeting amino acid metabolism can overcome Trastuzumab resistance in HER2+ breast cancer. The evidence supporting the claims of the authors is solid and the authors may want to validate their results in additional cell lines to strengthen their conclusions. Moreover, the authors should clarify the source of patient samples and why the manuscript focused on epigenetic regulations instead of major transcription factors. The work will be of interest to scientists working in the field of breast cancer.

    2. Reviewer #1 (Public review):

      Summary:

      Hua et al show how targeting amino acid metabolism can overcome Trastuzumab resistance in HER2+ breast cancer.

      Strengths:

      The authors used metabolomics, transcriptomics and epigenomics approaches in vitro and in preclinical models to demonstrate how trastuzumab-resistant cells utilize cysteine metabolism.

      Weaknesses:

      However, there are some key aspects that needs to be addressed.

      Major:

      (1) Patient Samples for Transcriptomic Analysis: It is unclear from the text whether tumor tissues or blood samples were used for the transcriptomic analysis. This distinction is crucial, as these two sample types would yield vastly different inferences. The authors should clarify the source of these samples.

      (2) The study only tested one trastuzumab-resistant and one trastuzumab-sensitive cell line. It is unclear whether these findings are applicable to other HER2-positive tumor cell lines, such as HCC1954. The authors should validate their results in additional cell lines to strengthen their conclusions.

      (3) Relevance to Metastatic Disease: Trastuzumab resistance often arises in patients during disease recurrence, which is frequently associated with metastasis. However, the mouse experiments described in this paper were conducted only in the primary tumors. This article would have more impact if the authors could demonstrate that the combination of Erastin or cysteine starvation with trastuzumab can also improve outcomes in metastasis models.

      Minor:

      (1) The figures lack information about the specific statistical tests used. Including this information is essential to show the robustness of the results.

      (2) Figure 3K Interpretation: The significance asterisks in Figure 3K do not specify the comparison being made. Are they relative to the DMSO control? This should be clarified.

    3. Reviewer #2 (Public review):

      In this manuscript, Hua et al. proposed SLC7A11, a protein facilitating cellular cystine uptake, as a potential target for the treatment of trastuzumab-resistant HER2-positive breast cancer. If this claim holds true, the finding would be of significance and might be translated to clinical practice. Nevertheless, this reviewer finds that the conclusion was poorly supported by the data.

      Notably, most of the data (Figures 2-6) were based on two cell lines - JIMT1 as a representative of trastuzumab-resistant cell line, and SKBR3 as a representative of trastuzumab sensitive cell line. As such, these findings could be cell-line specific while irrelevant to trastuzumab sensitivity at all. Furthermore, the authors claimed ferroptosis simply based on lipid peroxidation (Figure 3). Cell viability was not determined, and the rescuing effects of ferroptosis inhibitors were missing. The xenograft experiments were also suspicious (Figure 4). The description of how cysteine starvation was performed on xenograft tumors was lacking, and the compound (i.e., erastin) used by the authors is not suitable for in vivo experiments due to low solubility and low metabolic stability. Finally, it is confusing why the authors focused on epigenetic regulations (Figures 5 & 6), without measuring major transcription factors (e.g., NRF2, ATF4) which are known to regulate SLC7A11.

      To sum up, this reviewer finds that the most valuable data in this manuscript is perhaps Figure 1, which provides unbiased information concerning the metabolic patterns in trastuzumab-sensitive and primary resistant HER2-positive breast cancer patients.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Hua et al show how targeting amino acid metabolism can overcome Trastuzumab resistance in HER2+ breast cancer.

      Strengths:

      The authors used metabolomics, transcriptomics and epigenomics approaches in vitro and in preclinical models to demonstrate how trastuzumab-resistant cells utilize cysteine metabolism.

      Thank you for your valuable comments. We would like to extend our appreciation for your efforts. Your constructive suggestion would help improve our research.

      Weaknesses:

      However, there are some key aspects that needs to be addressed.

      Major:

      (1) Patient Samples for Transcriptomic Analysis: It is unclear from the text whether tumor tissues or blood samples were used for the transcriptomic analysis. This distinction is crucial, as these two sample types would yield vastly different inferences. The authors should clarify the source of these samples.

      Thank you for your valuable comments. In the transcriptomic analysis, we included the data of HER2 positive breast cancer patients who received trastuzumab in I-SPY2 trial (GSE181574). Tumor tissues were used in this dataset.

      (2) The study only tested one trastuzumab-resistant and one trastuzumab-sensitive cell line. It is unclear whether these findings are applicable to other HER2-positive tumor cell lines, such as HCC1954. The authors should validate their results in additional cell lines to strengthen their conclusions.

      Thank you for your valuable comments. We agree with your opinion, and the exploration of multiple cell lines would make our research findings more comprehensive. This is a limitation of our study, and we would continue to improve our design and methods in future experiments.

      (3) Relevance to Metastatic Disease: Trastuzumab resistance often arises in patients during disease recurrence, which is frequently associated with metastasis. However, the mouse experiments described in this paper were conducted only in the primary tumors. This article would have more impact if the authors could demonstrate that the combination of Erastin or cysteine starvation with trastuzumab can also improve outcomes in metastasis models.

      Thank you for your valuable comments. We agree with your suggestions. The exploration of metastatic disease would make our research more meaningful and help better address clinical key issues. In our future studies, we will continue to investigate the association between the invasive and metastatic capabilities of trastuzumab resistant HER2 positive breast cancer and cysteine metabolism.

      Minor:

      (1) The figures lack information about the specific statistical tests used. Including this information is essential to show the robustness of the results.

      Thank you for your valuable comments. We would include the statistical information in our figure legends.

      (2) Figure 3K Interpretation: The significance asterisks in Figure 3K do not specify the comparison being made. Are they relative to the DMSO control? This should be clarified.

      Thank you for your valuable comments. We would clarify the comparison information in our figure legends.

      Reviewer #2 (Public review):

      In this manuscript, Hua et al. proposed SLC7A11, a protein facilitating cellular cystine uptake, as a potential target for the treatment of trastuzumab-resistant HER2-positive breast cancer. If this claim holds true, the finding would be of significance and might be translated to clinical practice. Nevertheless, this reviewer finds that the conclusion was poorly supported by the data.

      Notably, most of the data (Figures 2-6) were based on two cell lines - JIMT1 as a representative of trastuzumab-resistant cell line, and SKBR3 as a representative of trastuzumab sensitive cell line. As such, these findings could be cell-line specific while irrelevant to trastuzumab sensitivity at all. Furthermore, the authors claimed ferroptosis simply based on lipid peroxidation (Figure 3). Cell viability was not determined, and the rescuing effects of ferroptosis inhibitors were missing. The xenograft experiments were also suspicious (Figure 4). The description of how cysteine starvation was performed on xenograft tumors was lacking, and the compound (i.e., erastin) used by the authors is not suitable for in vivo experiments due to low solubility and low metabolic stability. Finally, it is confusing why the authors focused on epigenetic regulations (Figures 5 & 6), without measuring major transcription factors (e.g., NRF2, ATF4) which are known to regulate SLC7A11.

      To sum up, this reviewer finds that the most valuable data in this manuscript is perhaps Figure 1, which provides unbiased information concerning the metabolic patterns in trastuzumab-sensitive and primary resistant HER2-positive breast cancer patients.

      Thank you for your valuable comments. We agree with your suggestions. Your feedback would help enhance the quality of our research.

      (1) Our research was mainly conducted in JIMT1 (trastuzumab resistant) and SKBR3 (trastuzumab sensitive), and this is a limitation of our study. The experimental validation using different cell lines will make our research findings more persuasive. In our future research, we will continuously optimize experimental design and methods to make our findings more comprehensive.

      (2) The detection of ferroptosis in our research was mainly performed by evaluating the lipid peroxidation. Experiments measuring cell viability and rescuing effects would help provide more evidence.

      (3) In xenograft experiments, the cysteine starvation was performed by feeding cysteine-free diet. The drug dissolution and other conditions were optimized by referring to previous relevant literature. We would clarify more details in our article.

      (4) Epigenetic modifications have been recognized as crucial factors in drug resistance formation. An increasing number of studies have emphasized the importance of epigenetic changes in regulating the abnormal expression of oncogenes and tumor suppressor genes related to drug resistance. Currently, the role of epigenetic changes in the development of trastuzumab resistance in HER2 positive breast cancer is still in exploration. We tried to investigate the dysregulation of histone modifications and DNA methylation in trastuzumab resistant HER2 positive breast cancer. Our findings indicated that targeting H3K4me3 and DNA methylation could decrease SLC7A11 expression and induce ferroptosis. This would provide more evidence in exploring trastuzumab resistance mechanisms. We will provide a more detailed discussion in the article.

      We would like to extend our appreciation for your constructive suggestions and continue to improve our research in future experiments.

    1. eLife Assessment

      This is an important study reporting that activation of the presynaptic GPR55 receptor suppresses synaptic transmission by modulating GABA release through the reduction of the readily releasable pool without affecting the presynaptic AP waveform and calcium influx. The evidence supporting this claim is compelling and based on an impressive array of techniques including patch-clamp recordings from the axon terminals of cerebellar Purkinje cells and fluorescent imaging of vesicular exocytosis. However, a few technical issues leave some questions open, these include uncertainty regarding the specificity of pharmacological agents and the nature of the endogenous process that would activate this pathway in vivo. In the current form, the evidence indicating that synaptic vesicles become insensitive to VGCC activation in the presence of GPR55 is weak and would need to be supported with additional experimental data.

    2. Reviewer #1 (Public review):

      In this manuscript, the authors report that GPR55 activation in presynaptic terminals of Purkinje cells decrease GABA release at the PC-DCN synapse. The authors use an impressive array of techniques (including highly challenging presynaptic recordings) to show that GPR55 activation reduces the readily releasable pool of vesicle without affecting presynaptic AP waveform and presynaptic Ca2+ influx. This is an interesting study, which is seemingly well-executed and proposes a novel mechanism for the control of neurotransmitter release. However, the authors' main conclusions are heavily, if not solely, based on pharmacological agents that most often than not demonstrate affinity at multiple targets. Below are points that the authors should consider in a revised version.

      Major points:

      (1) There is no clear evidence that GPR55 is specifically expressed in presynaptic terminals at the PC-DCN synapse. The authors cited Ryberg 2007 and Wu 2013 in the introduction, mentioning that GPR55 is potentially expressed in PCs. Ryberg (2007) offers no such evidence, and the expression in PC suggested by Wu (2013) does not necessarily correlate with presynaptic expression. The authors should perform additional experiments to demonstrate the presynaptic expression of GPR55 at PC-DCN synapse.

      (2) The authors' conclusions rest heavily on pharmacological experiments, with compounds that are sometimes not selective for single targets. Genetic deletion of GPR55 would be a more appropriate control. The authors should also expand their experiments with occlusion experiments, showing if the effects of LPI are absent after AM251 or O-1602 treatment. In addition, the authors may want to consider AM281 as a CB1R antagonist without reported effects at GPR55.

      (3) It is not clear how long the different drugs were applied, and at what time the recordings were performed during or following drug application. It appears that GPR55 agonists can have transient effects (Sylantyev, 2013; Rosenberg, 2023), possibly due to receptor internalization. The timeline of drug application should be reported, where IPSC amplitude is shown as a function of time and drug application windows are illustrated.

      (4) A previous investigation on the role of GPR55 in the control of neurotransmitter release is not cited nor discussed Sylantyev et al., (2013, PNAS, Cannabinoid- and lysophosphatidylinositol-sensitive receptor GPR55 boosts neurotransmitter release at central synapses). Similarities and differences should be discussed.

      Minor point:

      (1) What is the source of LPI? What isoform was used? The multiple isoforms of LPI have different affinities for GPR55.

    3. Reviewer #2 (Public review):

      Summary:

      This paper investigates the mode of action of GPR55, a relatively understudied type of cannabinoid receptor, in presynaptic terminals of Purkinje cells. The authors use demanding techniques of patch clamp recording of the terminals, sometimes coupled with another recording of the postsynaptic cell. They find a lower release probability of synaptic vesicles after activation of GPR55 receptors, while presynaptic voltage-dependent calcium currents are unaffected. They propose that the size of a specific pool of synaptic vesicles supplying release sites is decreased upon activation of GPR55 receptors.

      Strengths:

      The paper uses cutting-edge techniques to shed light on a little-studied, potentially important type of cannabinoid receptor. The results are clearly presented, and the conclusions are for the most part sound.

      Weaknesses:

      The nature of the vesicular pool that is modified following activation of GPR55 is not definitively characterized.

    4. Reviewer #3 (Public review):

      Summary:

      Inoshita and Kawaguchi investigated the effects of GPR55 activation on synaptic transmission in vitro. To address this question, they performed direct patch-clamp recordings from axon terminals of cerebellar Purkinje cells and fluorescent imaging of vesicular exocytosis utilizing synapto-pHluorin. They found that exogenous activation of GPR55 suppresses GABA release at Purkinje cell to deep cerebellar nuclei (PC-DCN) synapses by reducing the readily releasable pool (RRP) of vesicles. This mechanism may also operate at other synapses.

      Strengths:

      The main strength of this study lies in combining patch-clamp recordings from axon terminals with imaging of presynaptic vesicular exocytosis to reveal a novel mechanism by which activation of GPR55 suppresses inhibitory synaptic strength. The results strongly suggest that GPR55 activation reduces the RRP size without altering presynaptic calcium influx.

      Weaknesses:

      The study relies on the exogenous application of GPR55 agonists. It remains unclear whether endogenous ligands released due to physiological or pathological activities would have similar effects. There is no information regarding the time course of the agonist-induced suppression. There is also little evidence that GPR55 is expressed in Purkinje cells. This study would benefit from using GPR55 knockout (KO) mice. The downstream mechanism by which GPR55 mediates the suppression of GABA release remains unknown.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors report that GPR55 activation in presynaptic terminals of Purkinje cells decrease GABA release at the PC-DCN synapse. The authors use an impressive array of techniques (including highly challenging presynaptic recordings) to show that GPR55 activation reduces the readily releasable pool of vesicle without affecting presynaptic AP waveform and presynaptic Ca2+ influx. This is an interesting study, which is seemingly well-executed and proposes a novel mechanism for the control of neurotransmitter release. However, the authors' main conclusions are heavily, if not solely, based on pharmacological agents that most often than not demonstrate affinity at multiple targets. Below are points that the authors should consider in a revised version.

      We thank the reviewer for the encouraging comments, and will fully address the reviewer’s concerns as detailed below.

      Major points:

      (1) There is no clear evidence that GPR55 is specifically expressed in presynaptic terminals at the PC-DCN synapse. The authors cited Ryberg 2007 and Wu 2013 in the introduction, mentioning that GPR55 is potentially expressed in PCs. Ryberg (2007) offers no such evidence, and the expression in PC suggested by Wu (2013) does not necessarily correlate with presynaptic expression. The authors should perform additional experiments to demonstrate the presynaptic expression of GPR55 at PC-DCN synapse.

      We agree with the reviewer’s concern that the present manuscript lacks the evidence for localization of GPR55 at PC axon terminals. Honestly, our previous attempt to immune-label GPR55 did not work well. Now, we realize that different antibodies are commercially available, and are going to test them. Hopefully, in the revised manuscript, we will demonstrate immunocytochemical images showing GPR55 at terminals of PCs.

      (2) The authors' conclusions rest heavily on pharmacological experiments, with compounds that are sometimes not selective for single targets. Genetic deletion of GPR55 would be a more appropriate control. The authors should also expand their experiments with occlusion experiments, showing if the effects of LPI are absent after AM251 or O-1602 treatment. In addition, the authors may want to consider AM281 as a CB1R antagonist without reported effects at GPR55.

      We appreciate the reviewer for pointing out the essential issue regarding the specificity of activation of GPR55 in our study. Regarding the direct manipulation of GPR55, such as genetic deletion, we will try acute knock-down of its expression, considering the possibility of compensation which sometimes occur when the complete knock-out is performed. In addition, according to the reviewer’s suggestion, we will examine whether the effects of LPI and AM251 occlude each other, and also perform control experiments showing the lack of CB1R involvement.

      (3) It is not clear how long the different drugs were applied, and at what time the recordings were performed during or following drug application. It appears that GPR55 agonists can have transient effects (Sylantyev, 2013; Rosenberg, 2023), possibly due to receptor internalization. The timeline of drug application should be reported, where IPSC amplitude is shown as a function of time and drug application windows are illustrated.

      As suggested, the timing and duration of drug application will be indicated together with the time course of changes of IPSC amplitudes. This change will make things much clearer. Thank you for the suggestion.

      (4) A previous investigation on the role of GPR55 in the control of neurotransmitter release is not cited nor discussed Sylantyev et al., (2013, PNAS, Cannabinoid- and lysophosphatidylinositol-sensitive receptor GPR55 boosts neurotransmitter release at central synapses). Similarities and differences should be discussed.

      We are really sorry for missing this important study in discussion and citation. In the revised version, of course, we will discuss their findings and our data.

      Minor point:

      (1) What is the source of LPI? What isoform was used? The multiple isoforms of LPI have different affinities for GPR55.

      We are sorry for insufficient explanation about the LPI used in our study. We used LPI derived from soy (Merck, catalog #L7635) that was estimated to contain 58% C16:0 and 42% C18:0 or C18:2 LPI. This information will be added to the Materials and Methods in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This paper investigates the mode of action of GPR55, a relatively understudied type of cannabinoid receptor, in presynaptic terminals of Purkinje cells. The authors use demanding techniques of patch clamp recording of the terminals, sometimes coupled with another recording of the postsynaptic cell. They find a lower release probability of synaptic vesicles after activation of GPR55 receptors, while presynaptic voltage-dependent calcium currents are unaffected. They propose that the size of a specific pool of synaptic vesicles supplying release sites is decreased upon activation of GPR55 receptors.

      Strengths:

      The paper uses cutting-edge techniques to shed light on a little-studied, potentially important type of cannabinoid receptor. The results are clearly presented, and the conclusions are for the most part sound.

      We are really happy to hear the encouraging comments from the reviewer.

      Weaknesses:

      The nature of the vesicular pool that is modified following activation of GPR55 is not definitively characterized.

      During revision, we will perform further analysis and additional experiments to obtain deeper insights into the vesicle pools affected by GPR55 as much as possible.

      Reviewer #3 (Public review):

      Summary:

      Inoshita and Kawaguchi investigated the effects of GPR55 activation on synaptic transmission in vitro. To address this question, they performed direct patch-clamp recordings from axon terminals of cerebellar Purkinje cells and fluorescent imaging of vesicular exocytosis utilizing synapto-pHluorin. They found that exogenous activation of GPR55 suppresses GABA release at Purkinje cell to deep cerebellar nuclei (PC-DCN) synapses by reducing the readily releasable pool (RRP) of vesicles. This mechanism may also operate at other synapses.

      Strengths:

      The main strength of this study lies in combining patch-clamp recordings from axon terminals with imaging of presynaptic vesicular exocytosis to reveal a novel mechanism by which activation of GPR55 suppresses inhibitory synaptic strength. The results strongly suggest that GPR55 activation reduces the RRP size without altering presynaptic calcium influx.

      We thank the reviewer for the positive evaluation on our conclusions.

      Weaknesses:

      The study relies on the exogenous application of GPR55 agonists. It remains unclear whether endogenous ligands released due to physiological or pathological activities would have similar effects. There is no information regarding the time course of the agonist-induced suppression. There is also little evidence that GPR55 is expressed in Purkinje cells. This study would benefit from using GPR55 knockout (KO) mice. The downstream mechanism by which GPR55 mediates the suppression of GABA release remains unknown.

      We agree with the reviewer in all respects suggested as weaknesses. Most issues will be made much clearer by the additional experiments and analysis described above to respond to respective issues raised by other reviewers. The situation of endogenous ligands for GPR55 causing the synaptic depression and its downstream mechanism are very important issues, and we are going to discuss these points in the revised manuscript, and like to work on these in the future study.

    1. eLife Assessment

      Overall, this is an important work: the new methodology of hamFISH is a key additional tool for the assessment of the expression of multiple genes simultaneously. The authors provide convincing evidence of the utility of this approach on Medial Amygdala (MeA) tissue leveraging previous a transcriptomic dataset for gene selection. The authors also present a deeper dive into putative relationships between the on-tissue expression of subsets of genes and connectivity and behavioral regulation. The putative biological insights are intriguing, although preliminary, but notably they set up questions for future studies.

    2. Reviewer #1 (Public review):

      In their paper entitled "Combined transcriptomic, connectivity, and activity profiling of the medial amygdala using highly amplified multiplexed in situ hybridization (hamFISH)" Edwards et al. present a new method designated as hamFISH (highly amplified multiplexed in situ hybridization) that enables sequential detection of {less than or equal to}32 genes using multiplexed branched DNA amplification. As proof-of-principle, the authors apply the new technique - in conjunction with connectivity, and activity profiling - to the medial amygdala (MeA) of the mouse, which is a critical nucleus for innate social and defensive behaviors.

      As mentioned by Edwards et al., hamFISH could prove beneficial as an affordable alternative to other in situ transcriptomic methods, including commercial platforms, that are resource-intensive and require complex analysis pipelines. Thus, the authors envision that the method they present could democratize in situ cell-type identification in individual laboratories.

      The data presented by Edwards et al. is convincing. The authors use the appropriate and validated methodology in line with the current state-of-the-art. The paper makes a strong case for the benefits of hamFISH when combining transcriptomics studies with connectivity tracing and immediate early gene-based activity profiling. Notably, the authors also discuss the caveats and limitations of their study/approach in an open and transparent manner.

      In its current state, the manuscript touches upon a number of most intriguing, yet rather preliminary findings. For example, the roles of inhibitory neuron cluster i3 or of the selective and apparently MeA neuron-specific projections (Figure 3 - Figure Supplement 2D) remain elusive. As it is the authors' prime intent to provide "a proof-of-principle example of overlaying transcriptomic types, projection, and activity in a behaviorally relevant manner and demonstrates the usefulness of hamFISH in multiplexed in situ gene expression profiling", such studies might be beyond the scope of the present manuscript. The absence of such more in-depth hypothesis-based analysis, however, prevents an even more enthusiastic overall assessment.

    3. Reviewer #2 (Public review):

      Summary:

      The authors describe the development and implementation of hamFISH, a sensitive multiplexed ISH method. They leverage a pre-existing scRNA-seq dataset for the MeA to design 32 probes that combinatorically represent MeA neuronal populations - ~80% of MeA neurons express three of these markers. Using these markers to assess the spatial organization of the MeA, the authors identify a novel population of Ndnf+ projection neurons and characterize their connectivity with anterograde and retrograde labeling. They additionally combine hamFISH with CTB labeling of three principal MeA projection sites to show that 75% of MeA neurons have only a single projection target. Finally, they engage adult male mice in encounters with other adult males (aggression), females (mating), and pups (infanticide), followed by hamFISH and c-fos labeling to relate cell identity to behavior. Their overall conclusion is that hamFISH-defined cell types are broadly active to multiple sensory stimuli. However, the data presented are not sufficient to conclude that no selectivity exists within the MeA. A weakness of the study is that the selected hamFISH genes contain only Lhx6 as a lineage-marking transcription factor. Instead, the authors predominately use neuropeptides as markers. Genes such as Tac1, Cartpt, Adcyap1, Calb1, and Gal are expressed throughout the MeA, and many other brain regions; they are not restricted to a single transcriptomic cell type and they do not denote any developmental origins. By design, the panel has low cell type specificity as all MeA neurons express at least three of the genes. Therefore, the authors' conclusions may not hold with a more stringent classification of cell type or cell identity.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Edwards et al. describe hamFISH, a customizable and cost-efficient method for performing targeted spatial transcriptomics. hamFISH utilizes highly amplified multiplexed branched DNA amplification, and the authors extensively describe hamFISH development and its advantages over prior variants of this approach.

      The authors then used hamFISH to investigate an important circuit in the mouse brain for social behavior, the medial amygdala (MeA). To develop a hamFISH probe set capable of distinguishing MeA neurons, the authors mined published single-cell RNA-sequencing datasets of the MeA, ultimately creating a panel of 32 hamFISH probes that mostly cover the identified MeA cell types. They evaluated over 600,000 MeA cells and classified neurons into 16 inhibitory and 10 excitatory types, many of which are spatially clustered. The authors combined hamFISH with viral and other circuit tracer injections to determine whether the identified MeA cell populations sent and/or received unique inputs from connected brain regions, finding evidence that several cell types had unique patterns of input and output. Finally, the authors performed hamFISH on the brains of male mice that were placed in behavioral conditions that elicit aggressive, infanticidal, or mating behaviors, finding that some cell populations are selectively activated (as assessed by c-fos mRNA expression) in specific social contexts.

      Strengths:

      (1) The authors developed an optimized tissue preparation protocol for hamFISH and implemented oligopools instead of individually synthesized oligonucleotides to reduce costs. The branched DNA amplification scheme improved smFISH signal compared to previous methods, and multiple variants provide additional improvements in signal intensity and specificity. Compared to other spatial transcriptomics methods, the pipeline for imaging and analysis is streamlined and is compatible with other techniques like fluorescence-based circuit tracing. This approach is cost-effective and has several advantages that make it a valuable addition to the list of spatial transcriptomics toolkits.

      (2) Using 31 probes, hamFISH was able to detect 16 inhibitory and 10 excitatory neuron types in the MeA subregions, including the vast majority of cell types identified by other transcriptomics approaches. The authors quantified the distributions of these cell types along the anterior-posterior, dorsal-ventral, and medial-lateral axes, finding spatial segregation among some, but not all, MeA excitatory and inhibitory cell types. The authors additionally identified a class of inhibitory neurons expressing Ndnf (and a subset of these that express Chrna7) that project multiple social chemosensory circuits.

      (3) The authors combined hamFISH with MeA input and output mapping, finding cell-type biases in the projections to the MPOA, BNST, and VMHvl, and inputs from multiple regions.

      (4) The authors identified excitatory and inhibitory cell types, and patterns of activity across cell types, that were selectively activated during various social behaviors, including aggression, mating, and infanticide, providing new insights and avenues for future research into MeA circuit function.

      Weaknesses:

      (1) Gene selection for hamFISH is likely to still be a limiting factor, even with the expanded (32-probe) capacity. This may have contributed to the lack of ability to identify sexually dimorphic cell types (Figure S2B). This is an expected tradeoff for a method that has major advantages in terms of cost and adaptability.

      (2) Adaptation of hamFISH, for example, to adapt it to other brain regions or tissues, may require extensive optimization.

      (3) Pairing this method with behavioral experiments is likely to require further optimization, as c-fos mRNA expression is an indirect and incomplete survey of neuronal activity (e.g. not all cell types upregulate c-fos when electrically active). As such, there is a risk of false negative results that limit its utility for understanding circuit function.

      (4) The limited compatibility of hamFISH with thicker tissue samples and lack of optical sectioning introduce additional technical limitations. For example, it would be difficult to densely sample larger neural circuits using serial 20 micron sections. Also, because the imaging modality is not clear from the methods, it is difficult to know whether the analysis methods introduce the risk of misattributing gene expression to overlapping cells.

    5. Author response:

      Reviewer #1:

      In their paper entitled "Combined transcriptomic, connectivity, and activity profiling of the medial amygdala using highly amplified multiplexed in situ hybridization (hamFISH)" Edwards et al. present a new method designated as hamFISH (highly amplified multiplexed in situ hybridization) that enables sequential detection of {less than or equal to}32 genes using multiplexed branched DNA amplification. As proof-of-principle, the authors apply the new technique - in conjunction with connectivity, and activity profiling - to the medial amygdala (MeA) of the mouse, which is a critical nucleus for innate social and defensive behaviors.

      As mentioned by Edwards et al., hamFISH could prove beneficial as an affordable alternative to other in situ transcriptomic methods, including commercial platforms, that are resource-intensive and require complex analysis pipelines. Thus, the authors envision that the method they present could democratize in situ cell-type identification in individual laboratories.

      The data presented by Edwards et al. is convincing. The authors use the appropriate and validated methodology in line with the current state-of-the-art. The paper makes a strong case for the benefits of hamFISH when combining transcriptomics studies with connectivity tracing and immediate early gene-based activity profiling. Notably, the authors also discuss the caveats and limitations of their study/approach in an open and transparent manner.

      In its current state, the manuscript touches upon a number of most intriguing, yet rather preliminary findings. For example, the roles of inhibitory neuron cluster i3 or of the selective and apparently MeA neuron-specific projections (Figure 3 - Figure Supplement 2D) remain elusive. As it is the authors' prime intent to provide "a proof-of-principle example of overlaying transcriptomic types, projection, and activity in a behaviorally relevant manner and demonstrates the usefulness of hamFISH in multiplexed in situ gene expression profiling", such studies might be beyond the scope of the present manuscript. The absence of such more in-depth hypothesis-based analysis, however, prevents an even more enthusiastic overall assessment.

      We thank the reviewer for their positive assessment and agree that further studies are needed to explore and understand the MeA circuit further.

      Reviewer #2:

      The authors describe the development and implementation of hamFISH, a sensitive multiplexed ISH method. They leverage a pre-existing scRNA-seq dataset for the MeA to design 32 probes that combinatorically represent MeA neuronal populations - ~80% of MeA neurons express three of these markers. Using these markers to assess the spatial organization of the MeA, the authors identify a novel population of Ndnf+ projection neurons and characterize their connectivity with anterograde and retrograde labeling. They additionally combine hamFISH with CTB labeling of three principal MeA projection sites to show that 75% of MeA neurons have only a single projection target. Finally, they engage adult male mice in encounters with other adult males (aggression), females (mating), and pups (infanticide), followed by hamFISH and c-fos labeling to relate cell identity to behavior. Their overall conclusion is that hamFISH-defined cell types are broadly active to multiple sensory stimuli. However, the data presented are not sufficient to conclude that no selectivity exists within the MeA. A weakness of the study is that the selected hamFISH genes contain only Lhx6 as a lineage-marking transcription factor. Instead, the authors predominately use neuropeptides as markers. Genes such as Tac1, Cartpt, Adcyap1, Calb1, and Gal are expressed throughout the MeA, and many other brain regions; they are not restricted to a single transcriptomic cell type and they do not denote any developmental origins. By design, the panel has low cell type specificity as all MeA neurons express at least three of the genes. Therefore, the authors' conclusions may not hold with a more stringent classification of cell type or cell identity.

      We agree with the reviewer that a deeper level of cell type classification may reveal the selectivity of cell types that may have been missed. The design of our hamFISH bridge-readout probes allows modification to be compatible with a barcoded readout system such as MERFISH, which would substantially increase the number of genes that can be included in the gene panel. This would, however, increase the complexity of the analysis pipeline and reduce throughput, but would be a potential avenue to explore to define MeA cell types at a deeper level. An advantage of hamFISH is the ease of including and reading out alternative gene panels. For example, one panel could examine developmental-lineage-specific genes. Overall, our panel captures the highest hierarchical level (similar to the subclass level of the Allen taxonomy) of MeA transcriptomic types, based on published data available at the time of our gene panel design. Genes including Tac1, Cartpt, Adcyap1, Calb1, and Gal are expressed in specific patterns within the MeA and are useful for classification. In the original manuscript, we also included our rationale for dropping Foxp2, a lineage-specific marker gene in the MeA.

      Reviewer #3:

      In this manuscript, Edwards et al. describe hamFISH, a customizable and cost-efficient method for performing targeted spatial transcriptomics. hamFISH utilizes highly amplified multiplexed branched DNA amplification, and the authors extensively describe hamFISH development and its advantages over prior variants of this approach.

      The authors then used hamFISH to investigate an important circuit in the mouse brain for social behavior, the medial amygdala (MeA). To develop a hamFISH probe set capable of distinguishing MeA neurons, the authors mined published single-cell RNA-sequencing datasets of the MeA, ultimately creating a panel of 32 hamFISH probes that mostly cover the identified MeA cell types. They evaluated over 600,000 MeA cells and classified neurons into 16 inhibitory and 10 excitatory types, many of which are spatially clustered. The authors combined hamFISH with viral and other circuit tracer injections to determine whether the identified MeA cell populations sent and/or received unique inputs from connected brain regions, finding evidence that several cell types had unique patterns of input and output. Finally, the authors performed hamFISH on the brains of male mice that were placed in behavioral conditions that elicit aggressive, infanticidal, or mating behaviors, finding that some cell populations are selectively activated (as assessed by c-fos mRNA expression) in specific social contexts.

      Strengths:

      (1) The authors developed an optimized tissue preparation protocol for hamFISH and implemented oligopools instead of individually synthesized oligonucleotides to reduce costs. The branched DNA amplification scheme improved smFISH signal compared to previous methods, and multiple variants provide additional improvements in signal intensity and specificity. Compared to other spatial transcriptomics methods, the pipeline for imaging and analysis is streamlined and is compatible with other techniques like fluorescence-based circuit tracing. This approach is cost-effective and has several advantages that make it a valuable addition to the list of spatial transcriptomics toolkits.

      (2) Using 31 probes, hamFISH was able to detect 16 inhibitory and 10 excitatory neuron types in the MeA subregions, including the vast majority of cell types identified by other transcriptomics approaches. The authors quantified the distributions of these cell types along the anterior-posterior, dorsal-ventral, and medial-lateral axes, finding spatial segregation among some, but not all, MeA excitatory and inhibitory cell types. The authors additionally identified a class of inhibitory neurons expressing Ndnf (and a subset of these that express Chrna7) that project multiple social chemosensory circuits.

      (3) The authors combined hamFISH with MeA input and output mapping, finding cell-type biases in the projections to the MPOA, BNST, and VMHvl, and inputs from multiple regions.

      (4) The authors identified excitatory and inhibitory cell types, and patterns of activity across cell types, that were selectively activated during various social behaviors, including aggression, mating, and infanticide, providing new insights and avenues for future research into MeA circuit function.

      Weaknesses:

      (1) Gene selection for hamFISH is likely to still be a limiting factor, even with the expanded (32-probe) capacity. This may have contributed to the lack of ability to identify sexually dimorphic cell types (Figure S2B). This is an expected tradeoff for a method that has major advantages in terms of cost and adaptability.

      We recognise that the 32-plex gene detection might not be sufficient to address key questions in the transcriptomic organization of innate social behavior circuits, and that the study fell short of addressing more quantitative gene expression differences between sexes.  Detecting sexually dimorphic gene expression likely requires a more targeted approach as the dimorphism is expression differences rather than binary expression of marker genes, and the gene panel needs to be specifically configured for this purpose.

      (2) Adaptation of hamFISH, for example, to adapt it to other brain regions or tissues, may require extensive optimization.

      We have successfully performed hamFISH on at least two other mouse brain regions without needing to optimize further, suggesting that compatibility with other mouse brain regions is not an issue. We recognise, however, that optimization of hamFISH may be required for its application in other types of tissue or species. Human brain tissue, for example, typically suffers from high autofluorescence and different tissue preparation methods may need to be employed. We note that the amplification by hamFISH signal boost with v2 amplifiers may be useful to this end.

      (3) Pairing this method with behavioral experiments is likely to require further optimization, as c-fos mRNA expression is an indirect and incomplete survey of neuronal activity (e.g. not all cell types upregulate c-fos when electrically active). As such, there is a risk of false negative results that limit its utility for understanding circuit function.

      We acknowledge that c-fos is not the only readout of neuronal activity and that a panel of immediate early genes would allow a more comprehensive readout of activity-dependent gene expression. We fully agree that immediate early gene induction is an indirect readout of neural activity, and alternative methods such as in vivo physiology would provide a complementary insight into the selectivity of MeA neuron responses.

      (4) The limited compatibility of hamFISH with thicker tissue samples and lack of optical sectioning introduce additional technical limitations. For example, it would be difficult to densely sample larger neural circuits using serial 20 micron sections. Also, because the imaging modality is not clear from the methods, it is difficult to know whether the analysis methods introduce the risk of misattributing gene expression to overlapping cells.

      We agree that the use of hamFISH as described here is restricted to thin (<20 um) sections. We have shown, however, that our encoding probe and bridge-readout probe design are compatible with HCR-based mRNA detection, which is compatible with thicker sections. Regarding the misattribution of gene expression to overlapping cells in the z-axis, we used epifluorescence microscopy with 14x 500 nm z-steps to collect our raw data and generate maximum intensity projections for further analysis. Because of the thin sections (10 um) used for the imaging, the overlap between cells in z is expected to be minimal. Regarding throughput, we agree that hamFISH is likely not suitable for brain-wide questions that require large volume coverage, but its major advantage is that it allows routine use of low-level multiplexing for targeted brain areas.

    1. eLife Assessment

      In this valuable contribution, the authors present an approach based on a complex systems theoretical framework to characterize diet-host-microbe interactions and to develop targeted bacteriotherapies using a three-phase workflow. Overall, the solid results provide a reference for microbial community research and insights to guide future studies. However, the theoretical systems approach would benefit from further description, and some claims regarding oxalate bacterial metabolism in complex microbial communities could be strengthened. This study will interest researchers working on gut microbiomes specifically those seeking to modulate host-microbial interactions.

    2. Reviewer #1 (Public review):

      Summary:

      This study experimentally examined diet-microbe-host interactions through a complex systems framework, centered on dietary oxalate. Multiple, independent molecular, animal, and in vitro experimental models were introduced into this research. The authors found that microbiome composition influenced multiple oxalate-microbe-host interfaces. Oxalobacter formigenes were only effective against a poor oxalate-degrading microbiota background and give critical new insights into why clinical intervention trials with this species exhibit variable outcomes. Data suggest that, while heterogeneity in the microbiome impacts multiple diet-host-microbe interfaces, metabolic redundancy among diverse microorganisms in specific diet-microbe axes is a critical variable that may impact the efficacy of bacteriotherapies, which can help guide patient and probiotic selection criteria in probiotic clinical trials.

      Strengths:

      The paper has made significant progress in both the depth and breadth of scientific research by systematically comparing multiple experimental methods across multiple dimensions. Particularly through in-depth analysis from the enzymatic perspective, it has not only successfully identified several key strains and redundant genes, which is of great significance for understanding the functions of enzymes, the characteristics of strains, and the mechanisms of genes in microbial communities, but also provided a valuable reference for subsequent experimental design and theoretical research.

      More importantly, the establishment of a novel research approach to probiotics and gut microbiota in this paper represents a major contribution to the current research field. The proposal of this new approach not only breaks through the limitations of traditional research but also offers new perspectives and strategies for the screening, optimization of probiotics, and the regulation of gut microbiota balance. This holds potential significant value for improving human health and the prevention and treatment of related diseases.

      Weaknesses:

      While the study has excellently examined the overall changes in microbial community structure and the functions of individual bacteria, it lacks a focused investigation on the metabolic cross-feeding relationships between oxalate-degrading bacteria and related microorganisms, failing to provide a foundational microbial community or model for future research. Although this paper conducts a detailed study on oxalate metabolism, it would be beneficial to visually present the enrichment of different microbial community structures in metabolic pathways using graphical models.

      Furthermore, the authors have done a commendable job in studying the roles of key bacteria. If the interactions and effects of upstream and downstream metabolically related bacteria could be integrated, it would provide readers with even more meaningful information. By illustrating how these bacteria interact within the metabolic network, readers can gain a deeper understanding of the complex ecological and functional relationships within microbial communities. Such an integrated approach would not only enhance the scientific value of the study but also facilitate future research in this area.

    3. Reviewer #2 (Public review):

      Summary:

      Using the well-studied oxalate-microbiome-host system, the authors propose a novel conceptual and experimental framework for developing targeted bacteriotherapies using a three-phase pre-clinical workflow. The third phase is based on a 'complex system theoretical approach' in which multi-omics technologies are combined in independent in vivo and in vitro models to successfully identify the most pertinent variables that influence specific phenotypes in diet-host-microbe systems. The innovation relies on the third phase since phase I and phase II are the dominant approaches everyone in the microbiome field uses.

      Strengths:

      The authors used a multidisciplinary approach which included:

      (1) fecal transplant of two distinct microbial communities into Swiss-Webster mice (SWM) to characterize the host response (hepatic response-transcriptomics) and microbial activity (untargeted metabolomics of the stool samples) to different oxalate concentrations;

      (2) longitudinal analysis of the N. albigulia gut microbiome composition in response to varying concentrations of oxalate by shotgun metagenomics, with deep bioinformatic analyses of the genomes assembled; and

      (3) development of synthetic microbial communities around oxalate metabolisms and evaluation of these communities' activity in oxalate degradation in vivo.

      Weaknesses:

      However, I have concerns about the frame the authors tried to provide for a 'complex system theoretical approach' and how the data are interpreted within this frame. Several of the conclusions the authors provide do not seem to have sufficient data to support them.

    1. eLife Assessment

      This important study investigates the function of a critical regulator of human early cardiac development. The convincing examination of GATA6 function is thorough and well-executed. The study will be of interest to scientists working on how the human heart acquires its identity.

    2. Reviewer #1 (Public review):

      Summary:

      This is a comprehensive study that clearly and deeply investigates the function of GATA6 in human early cardiac development.

      Strengths:

      This study combines hESC engineering, differentiation, detailed gene expression, genome occupancy, and and pathway modulation to elucidate the role of GATA6 in early cardiac differentiation. The work is carefully executed and the results support the conclusions. The use of publicly available data is well integrated throughout the manuscript. The RIME experiments are excellent.

      Weaknesses:

      Much has been known about GATA6 in mesendoderm development, and this is acknowledged by the authors.

      Comments on revised version:

      The authors have addressed my comments appropriately.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Bisson et al describes the role GATA6 to regulate cardiac progenitor cell (CPC) specification and cardiomyocyte (CM) generation using human embryonic stem cells (hESCs). The authors found that GATA6 loss-of-function hESC exhibit early defects in mesendoderm and lateral mesoderm patterning stages. Using RNA-seq and CUT&RUN assays the genes of the Wnt and BMP programs were found to be affected by the loss of GATA6 expression. Modulating Wnt and BMP during early cardiac differentiation can partially rescue CPC and CM defects in GATA6 hetero- and homozygous mutant hESCs.

      Strengths:

      The studies performed were rigorous and the rationale for the experimental designed were logical. The results obtained were clear and supports the conclusions that the authors made regarding the role of GATA6 on Wnt and BMP pathway gene expression.

      Weaknesses:

      Given the wealth of studies that have been performed in this research area previously, the amount of new information provided in this study is relatively modest. Nevertheless, the results and quite clear and should make a strong contribution to the field.

      Comments on revised version:

      The authors have addressed the prior request to assess genes expression representing each stage of development/differentiation from mesoderm to cardiac progenitor to cardiomyocytes and confirmed that the differentiation defect lies at the cardiac progenitor and cardiomyocyte stages and not in mesodermal differentiation. This work has significantly improved the robustness of the study.

    4. Reviewer #3 (Public review):

      In this study, Bison et al. analyzed the role of the GATA6 transcription factor in patterning the early mesoderm and generating cardiomyocytes, using human embryonic stem cell differentiation assays and patient-derived hiPSCs with heart defects associated with mutations in the GATA6 gene. They identified a novel role for GATA6 in regulating genes involved in the WNT and BMP pathways. Modulation of the WNT and BMP pathways partially rescue early cardiac mesoderm defects in GATA6 mutant hESCs. These results provide significant insights into how GATA6 loss-of-function and heterozygous mutations contribute to heart defects.

      Comments on revised version:

      The authors have addressed all the concerns, using new data and modifications to the text to further strengthen the manuscript.

    1. eLife Assessment

      The study by Pudlowski et al. shows that a previously-identified protein complex, composed of delta- and epsilon-tubulin together with TEDC1 and TEDC2, functions in generating centriolar triplet microtubules, and that this is crucial for the proper formation of centriolar subdomains and the stability of centrioles throughout the cell cycle. This is an important study that advances our understanding of centriole biogenesis and structure and is supported by convincing evidence based on knockout cell lines, immunoprecipitation, and ultrastructure expansion microscopy. The work is of interest to cell biologists, in particular researchers with interest in centrosome biology.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Pudlowski et al. investigates how the intricate structure of centrioles is formed by studying the role of a complex formed by delta- and epsilon-tubulin and the TEDC1 and TEDC2 proteins. For this they employ knockout cell lines, EM and ultrastructure expansion microscopy as well as pull-downs. Previous work has indicated a role of delta- and epsilon-tubulin in triplet microtubule formation. Without triplet microtubules centriolar cylinders can still form, but are unstable, resulting is futile rounds of de novo centriole assembly during S phase and disassembly during mitosis. Here the authors show that all four proteins function as a complex and knockout of any of the four proteins results in the same phenotype. They further find that mutant centrioles lack inner scaffold proteins and contain an extended proximal end including markers such as SAS6 and CEP135, suggesting that triplet microtubule formation is linked to limiting proximal end extension and formation of the central region that contains the inner scaffold. Finally, they show that mutant centrioles seem to undergo elongation during early mitosis before disassembly, although it is not clear if this may also be due to prolonged mitotic duration in mutants.

      Strengths:

      Overall this is a well-performed study, well presented, with conclusions supported by convincing data based on knockout cell lines, rescue experiments, and detailed quantifications.

      Weaknesses:

      Most weaknesses have been addressed in the revised version. The precise mapping of TED complex proteins to centrioles remains challenging with the available tools but has been addressed through the use of several complementary super-resolution techniques.

    3. Reviewer #2 (Public review):

      Summary:

      In this article, the authors study the function of TEDC1 and TEDC2, two proteins previously reported to interact with TUBD1 and TUBE1. Previous work by the same group had shown that TUBD1 and TUBE1 are required for centriole assembly and that human cells lacking these proteins form abnormal centrioles that only have singlet microtubules that disintegrate in mitosis. In this new work, the authors demonstrate that TEDC1 and TEDC2 depletion results in the same phenotype with abnormal centrioles that also disintegrate into mitosis. In addition, they were able to localize these proteins to the proximal end of the centriole, a result not previously achieved with TUBD1 and TUBE1, providing a better understanding of where and when the complex is involved in centriole growth.

      Strengths:

      The results are very convincing, particularly the phenotype, which is the same as previously observed for TUBD1 and TUBE1. The U-ExM localization is also convincing: despite a signal that's not very homogeneous, it's clear that the complex is in the proximal region of the centriole and procentriole. The phenotype observed in U-ExM on the elongation of the cartwheel is also spectacular and opens the question of the regulation of the size of this structure. The authors also report convincing results on direct interactions between TUBD1, TUBE1, TEDC1, and TEDC2, and an intriguing structural prediction suggesting that TEDC1 and TEDC2 form a heterodimer that interacts with the TUBD1- TUBE1 heterodimer.

      Comments on revisions:

      I would like to thank the authors for their work and for thoroughly addressing most of my questions. I extend my congratulations to the authors for this excellent and impactful article.

    4. Reviewer #3 (Public review):

      Summary:

      Human cells deficient in delta-tubulin or epsilon-tubulin form unstable centrioles, which lack triplet microtubules and undergo a futile formation and disintegration cycle. In this study, the authors show that human cells lacking the associated proteins TEDC1 or TEDC2 have these identical phenotypes. They use genetics to knockout TEDC1 or TEDC2 in p53-negative RPE-1 cells and expansion microscopy to structurally characterize mutant centrioles. Biochemical methods and AlphaFold-multimer prediction software are used to investigate interactions between tubulins and TEDC1 and TEDC2.

      The study shows that mutant centrioles are built only of A tubules, which elongate and extend their proximal region, fail to incorporate structural components, and finally disintegrate in mitosis. In addition, they demonstrate that delta-tubulin or epsilon-tubulin and TEDC1 and TEDC2 form one complex and that TEDC1 TEDC2 can interact independently of tubulins. Finally, they show that localization of four proteins is mutually dependent.

      Strengths:

      The results presented here are convincing, exciting, and important, and the manuscript is well-written. The study shows that delta-tubulin, epsilon-tubulin, TEDC1, and TEDC2 function together to build a stable and functional centriole, significantly contributing to the field and our understanding of the centriole assembly process.

      Weaknesses:

      The ultrastructural characterization of TEDC1 and TEDC2 in centrosomes remains challenging. Nevertheless, it is evident that these proteins occupy growing centrioles and the proximal parts of mother centrioles.

      Comments on revisions:

      The authors have done a great job extending the original experiments and measurements and answering outstanding questions.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:  

      Reviewer #1 (Public Review):  

      Summary:  

      The study by Pudlowski et al. investigates how the intricate structure of centrioles is formed by studying the role of a complex formed by delta- and epsilon-tubulin and the TEDC1 and TEDC2 proteins. For this, they employ knockout cell lines, EM, and ultrastructure expansion microscopy as well as pull-downs. Previous work has indicated a role of delta- and epsilon-tubulin in triplet microtubule formation. Without triplet microtubules centriolar cylinders can still form, but are unstable, resulting in futile rounds of de novo centriole assembly during S phase and disassembly during mitosis. Here the authors show that all four proteins function as a complex and knockout of any of the four proteins results in the same phenotype. They further find that mutant centrioles lack inner scaffold proteins and contain an extended proximal end including markers such as SAS6 and CEP135, suggesting that triplet microtubule formation is linked to limiting proximal end extension and formation of the central region that contains the inner scaffold. Finally, they show that mutant centrioles seem to undergo elongation during early mitosis before disassembly, although it is not clear if this may also be due to prolonged mitotic duration in mutants.  

      Strengths:  

      Overall this is a well-performed study, well presented, with conclusions mostly supported by the data. The use of knockout cell lines and rescue experiments is convincing.  

      Weaknesses:  

      In some cases, additional controls and quantification would be needed, in particular regarding cell cycle and centriole elongation stages, to make the data and conclusions more robust. 

      We thank the reviewer for these comments and have improved our analyses of these as detailed below.

      Reviewer #2 (Public Review):  

      Summary:  

      In this article, the authors study the function of TEDC1 and TEDC2, two proteins previously reported to interact with TUBD1 and TUBE1. Previous work by the same group had shown that TUBD1 and TUBE1 are required for centriole assembly and that human cells lacking these proteins form abnormal centrioles that only have singlet microtubules that disintegrate in mitosis. In this new work, the authors demonstrate that TEDC1 and TEDC2 depletion results in the same phenotype with abnormal centrioles that also disintegrate into mitosis. In addition, they were able to localize these proteins to the proximal end of the centriole, a result not previously achieved with TUBD1 and TUBE1, providing a better understanding of where and when the complex is involved in centriole growth.  

      Strengths:  

      The results are very convincing, particularly the phenotype, which is the same as previously observed for TUBD1 and TUBE1. The U-ExM localization is also convincing:

      despite a signal that's not very homogeneous, it's clear that the complex is in the proximal region of the centriole and procentriole. The phenotype observed in U-ExM on the elongation of the cartwheel is also spectacular and opens the question of the regulation of the size of this structure. The authors also report convincing results on direct interactions between TUBD1, TUBE1, TEDC1, and TEDC2, and an intriguing structural prediction suggesting that TEDC1 and TEDC2 form a heterodimer that interacts with the TUBD1- TUBE1 heterodimer.  

      Weaknesses:  

      The phenotypes observed in U-ExM on cartwheel elongation merit further quantification, enabling the field to appreciate better what is happening at the level of this structure.  

      We thank the reviewer for these comments and have improved our analyses of cartwheel elongation as detailed below.

      Reviewer #3 (Public Review):  

      Summary:  

      Human cells deficient in delta-tubulin or epsilon-tubulin form unstable centrioles, which lack triplet microtubules and undergo a futile formation and disintegration cycle. In this study, the authors show that human cells lacking the associated proteins TEDC1 or TEDC2 have these identical phenotypes. They use genetics to knockout TEDC1 or TEDC2 in p53negative RPE-1 cells and expansion microscopy to structurally characterize mutant centrioles. Biochemical methods and AlphaFold-multimer prediction software are used to investigate interactions between tubulins and TEDC1 and TEDC2.  

      The study shows that mutant centrioles are built only of A tubules, which elongate and extend their proximal region, fail to incorporate structural components, and finally disintegrate in mitosis. In addition, they demonstrate that delta-tubulin or epsilon-tubulin and TEDC1 and TEDC2 form one complex and that TEDC1 TEDC2 can interact independently of tubulins. Finally, they show that the localization of four proteins is mutually dependent.  

      Strengths:  

      The results presented here are mostly convincing, the study is exciting and important, and the manuscript is well-written. The study shows that delta-tubulin, epsilon-tubulin, TEDC1, and TEDC2 function together to build a stable and functional centriole, significantly contributing to the field and our understanding of the centriole assembly process.  

      Weaknesses:  

      The ultrastructural characterization of TEDC1 and TEDC2 obtained by U-ExM is inconclusive. Improving the quality of the signals is paramount for this manuscript.  

      We thank the reviewer for these comments and have improved our imaging of TEDC1 and TEDC2 localization, as detailed below.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):  

      The reviewers agreed that the conclusions are largely supported by solid evidence, but felt that improving the following aspects would make some of the data more convincing:  

      (1) The UExM localizations of TEDC1/2 are not very convincing and the reviewers suggest to complement these with alternative super-resolution approaches (e.g. SIM) and/or different labeling techniques such as pre-expansion labeling using STAR red/orange secondaries (also robust for SIM and STED), use of the Halo tag, different tag antibodies, etc 

      We thank the reviewers for these recommendations and have adapted two of these strategies to improve our imaging of TEDC1 and TEDC2 localization. First, we used an alternative super-resolution approach, a Yokogawa CSU-W1 SoRA confocal scanner (resolution = 120 nm) and imaged cells grown on coverslips (not expanded). We found that TEDC1 and TEDC2 localize to procentrioles and the proximal end of parental centrioles (Fig 2 – Supplementary Figure 1a, b). Second, we used a recently described expansion gel chemistry (Kong et al., Methods Mol Biol 2024) combined with Abberior Star red and orange secondary antibodies. This technique resulted in robust signal at centrosomes and in the cytoplasm and indicated that TEDC1 and TEDC2 localize near the centriole walls of procentrioles and the proximal region of parental centrioles, near CEP44 (Fig 2 – Supplementary Figure 1c, d). These results complement and support our initial observations (Fig 2C, D) and we have edited the text to reflect this (lines 157-163). We also note that these Flag tag and V5 tag primary antibodies are specific and have little background signal in all applications (Fig 2 – Supplementary Fig 1E-J), while other commercially available antibodies against these tags did exhibit non-specific signal. 

      (2) The cell cycle classifications of centrioles would strongly benefit, apart from a better description, from adding quantifications of average centriole length at a given stage based on tubulin staining (not acTub). 

      We thank the reviewers for these recommendations. We have added an improved description of our cell cycle analyses (lines 234-237). We have also added new analyses for centriole length as measured by staining with alpha-tubulin (Fig 4 – Supp 3 and Fig 4 – Supp 4). We find that in all mutants, acetylated tubulin elongates along with alpha-tubulin in a similar way as control centrioles.

      Reviewer #1 (Recommendations For The Authors):  

      Specific points:  

      (1) The introduction is a bit oddly structured. About halfway through it summarizes what is going to be presented in the study, giving the impression that it is about to conclude, but then continues with additional, detailed introduction paragraphs. Overall, the authors may also want to consider making it more concise.

      We thank the reviewer for these suggestions and have shortened and restructured the introduction for clarity and conciseness.

      (2) The text should explain to the non-expert reader why endogenous proteins are not detected and why exogenously expressed, tagged versions are used. Related to this, the authors state overexpression, but what is this assessment based on? Does expression at the endogenous level also rescue? At least by western blot, these questions should be addressed. 

      In the text, we have added clarification about why endogenous proteins were not detected for immunofluorescence (lines 149-151). To quantify the overexpression, we have added Western blots of TEDC1 and TEDC2 to Fig 1 – Supplementary Figure 1E,F. We note that endogenous levels of both proteins are very low, and the rescue constructs are overexpressed 20 to 70 fold above endogenous levels.  

      (3) The figures should clearly indicate when tagged proteins are used and detected.

      Currently, this info is only found in the legends but should be in the figure panels as well. 

      We have made these changes to the figure panels in Fig 2, Fig 2 – Supp 1, and Fig 3.

      (4)  I could not find a description and reference to Figure 2 Supplement 2 and 3. 

      We have replaced these supplements with new supplementary figures for TEDC1 and TEDC2 localization (Fig 2 – Supp 1).

      (5) The multiple bands including unspecific (?) bands should be labeled to guide the reader in the western blots. 

      We have labeled nonspecific bands in our Western blots with asterisks (Fig 1 – Supp 1, Fig 3)

      (6) The alphafold prediction suggests that TUBD1 can bind to the TED complex in the absence of TUBE1 can this be shown? This would be a nice validation of the predicted architecture of the complex. I also missed a bit of a discussion of the predicted architecture. How could it be linked to triplet microtubule formation? Is the latest alphafold version 3 adding anything to this analysis? 

      In our pulldown experiments, we found that TUBD1 cannot bind to TEDC1 or TEDC2 in the absence of TUBE1 (Fig 3C, D, IB: TUBD1). We performed this experiment with three biological replicates and found the same result. It is possible that TUBD1 and TUBE1 form an intact heterodimer, similar to alpha-tubulin and beta-tubulin, and this will be an exciting area of future research.

      We have added new analysis from AlphaFold3 (Fig 3 – Supp 1B). AlphaFold3 predicts a similar structure as AlphaFold Multimer.

      We have also added additional discussion about the AlphaFold prediction to the text (lines 220-222, 365-367). Thanks to the reviewer for pointing out this oversight.

      (7) I suggest briefly explaining in the text how cells and centrioles at different cell cycle stages were identified. I found some info in the legend of Figure 1, but no info for other figures or in the text. Related to this, how are procentrioles defined in de novo formation? There is no parental centriole to serve as a reference. 

      We have added a brief explanation of the synchronization and identification in lines 234237. We have also clarified the text regarding de novo centrioles, and now term these “de novo centrioles in the first cell cycle after their formation” (lines 271-272).

      (8) Related to point 7: using acetylated tubulin as a universal length and width marker seems unreliable since it is a PTM. The authors should use general tubulin staining to estimate centriole dimensions, or at least establish that acetylated tubulin correlates well with the overall tubulin signal in all mutants. 

      We have added two supplementary data figures (Fig 4 – supp 3 and Fig 4 – supp 4) in which we co-stain control and mutant centrioles with alpha-tubulin. We found that acetylated tubulin marked mutant centrioles well and as alpha-tubulin length increased, acetylated tubulin length also increased. 

      (9) Presence and absence of various centriolar proteins. These analyses lack a clear reference for the precise centriole elongation stage. This is particularly problematic for proteins that are recruited at specific later stages (such as inner scaffold proteins). The staining should be correlated with centriole length measurements, ideally using general tubulin staining.  

      As described for point 8, we have added two supplementary data figures in which we costain control and mutant centrioles with alpha-tubulin and found that acetylated tubulin also increases as overall tubulin length increases in all mutants. We note that inner scaffold proteins are absent in all our mutant centrioles at all stages of the cell and centriole cycle, as also previously reported for POC5 in Wang et al., 2017.

      Reviewer #2 (Recommendations For The Authors):  

      Here's a list of points I think could be improved:  

      -  As the authors previously published, the centriole appears to have a smaller internal diameter than mature centrioles. Could the authors measure to see if the phenotype is identical? Is the centriole blocked in the bloom phase (Laporte et al. 2024)? 

      We have added an additional supplementary figure (Fig 4 – supp 5) to show that mutant centrioles have smaller diameters than mature centrioles, as we previously reported for the delta-tubulin and epsilon-tubulin mutant centrioles by EM. We thank the reviewers for the additional question of the bloom phase. Given the comparatively smaller number of centrioles we analyzed in this paper compared to Laporte et al (50 to 80 centrioles per condition here, versus 800 centrioles in Laporte et al), it is difficult to definitively conclude whether there is a block in bloom phase. This would be an interesting area for future research.  

      -  The images of the centrioles in EM are beautiful. Would it be possible to apply a symmetrisation on it to better see the centriolar structures? For example, is the A-C linker present? 

      We thank the reviewer for this excellent suggestion. Using centrioleJ, we find that the A-C linker is absent from mutant centrioles. The symmetrized images have been added to Fig 1 – Supplementary Fig 2, and additional discussion has been added to the text (line 143-144, line 368-374).  

      -  How many EM images were taken? Did the centrioles have 100% A-microtubule only or sometimes with B-MT? 

      For TEM, we focused on centrioles that were positioned to give perfect cross-section images of the centriolar microtubules, and thus did not take images of off-angle or rotated centrioles. Given the difficulty of this experiment (centrioles are small structures within the cell, centrosomes are single-copy organelles, and off-angle centrioles were not imaged), we were lucky to image 3 centrioles that were in perfect cross-section – 2 for Tedc1<sup>-/-</sup> and 1 for Tedc2<sup>-/-</sup>. Our images indicate that these centrioles only have A-tubules (Fig 1 – Supp Fig

      2).

      -  In Figure 2 - it would be preferable to write TEDC2-flag or TEDC1-flag and not TEDC2/1. 

      We have made this change

      -  It seems that Figures 2C and D aren't cited, and some of the data in the supplemental data are not described in the main text. 

      We have replaced these supplements with new supplementary figures for TEDC1 and TEDC2 localization (Fig 2 – Supp 1).

      -  The signal in U-ExM with the anti-Flag antibody is heterogeneous. Did the authors test several anti-FLAG antibodies in U-ExM? 

      We tested several anti-Flag and anti-V5 antibodies for our analyses, and chose these because they have little background signal in all applications (Fig 2 – Supplementary Fig 1E-J). Other commercially available antibodies against these tags did exhibit non-specific signal.

      -  The AlphaFold prediction is difficult to interpret, the authors should provide more views and the PDB file. 

      We have added 2 additional views of the AlphaFold prediction in Fig 3 – Supp 1A.

      -  In general, but particularly for Figure 4: the length doesn't seem to be divided by the expansion factor, it is therefore difficult to compare with known EM dimensions. Can the authors correct the scale bars? 

      We have corrected the scale bars for all figures to account for the expansion factor.

      -  Concerning Gamma-tubulin that is "recruited to the lumen of centrioles by the inner scaffold, had localization defects in mutant centrioles. However, we were unable to reliably detect gamma-tubulin within the lumen of control or de novo-formed centrioles in S or G2-phase (Figure 4 - Supplement 1E), and thus were unable to test this hypothesis". In Laporte et al 2024, Gamma-tubulin arrives later than the inner scaffold and only on mature centrioles, so this result appears to be in line with previous observation. However, the authors should be able to detect a proximal signal under the microtubules of the procentriole, is this the case? 

      We agree that this is an exciting question. However, in our expansion microscopy staining, we frequently observe that gamma-tubulin surrounds centrioles, corresponding to its role in the pericentriolar material (PCM). In our hands, we find it difficult to distinguish between centriolar gamma-tubulin at the base of the A-tubule from gamma-tubulin within the PCM.  

      -  In the signal elongation of SAS-6, STIL, CEP135, CPAP, and CEP44, would it be possible to quantify the length of these signals (with dimensions divided by the expansion factor for comparison with known TEM distances)? 

      We have quantified the lengths of SAS-6 and CEP135 in new Fig 4 – Supp 3 and Fig 4 – Supp 4.  

      -  The authors observe that centrin is present, but only as a SFI1 dot-like localization (which is another protein that would be interesting to look at), and not an inner scaffold localization. Can the authors elaborate? These results suggest that the distal part is correctly formed with only a microtubule singlet. 

      We agree with the reviewer’s interpretation that the centriole distal tip is likely correctly formed with only singlet microtubules, as both distal centrin and CP110 are present. We have added this point to the discussion (line 415).

      -The authors observe that CPAP is elongated, but CPAP has two locations, proximal and distal. Is it distal or proximal elongation? Is the proximal signal of CPAP longer than that of CEP44 in the mutants? The authors discuss that the elongation could come from overexpression of CPAP, but here it seems that the centriole is not overlong, just the structures around the cartwheel. 

      We thank the reviewer for this point. It is difficult for us to conclude whether the proximal or distal region is extended in the mutants, as our mutant centrioles lacks a visible separation between these two regions. It would be interesting to probe this question in the future by testing whether subdomains of CPAP may be differentially regulated in our mutants.

      Reviewer #3 (Recommendations For The Authors):  

      It isn't apparent to me what was counted in Figure 1C. Were all centrioles (mother centrioles and procentrioles) counted? Where is the 40% in control cells coming from? Can this set of data be presented differently? 

      We apologize for the confusion. In this figure, all centrioles were counted. We have updated the figure legend for clarity. We performed this analysis in a similar way as in Wang et al., 2017 to better compare phenotypes.  

      Figure 2C. and the text lines 182-187: The ultrastructural characterization of TEDC1 and TEDC2 suffers from the low quality of the TEDC1 and TEDC2 signals obtained postexpansion. In comparison with robust low-resolution immunosignal, it appears that most of the signal cannot be recovered after expansion. Another sub-resolution imaging method to re-analyze TEDC1 and TEDC22 localization would be essential. The same concern applies to Figures 2 - Supplement 2 and 3. Also, Figure 2 - Supplement 2 and Supplement 3 do not seem to be cited. 

      We thank the reviewer for these recommendations. As also mentioned above, we used an alternative super-resolution approach, a Yokogawa CSU-W1 SoRA confocal scanner (resolution = 120 nm), and found that TEDC1 and TEDC2 localize to procentrioles and the proximal end of parental centrioles (Fig 2 – Supplementary Figure 1a, b). Second, we used a recently described expansion gel chemistry (Kong et al., Methods Mol Biol 2024) combined with Abberior Star red and orange secondary antibodies. This technique resulted in robust signal at centrosomes and in the cytoplasm and indicated that TEDC1 and TEDC2 localize near the centriole walls of procentrioles and the proximal region of parental centrioles, near CEP44 (Fig 2 – Supplementary Figure 1c, d). These stainings complement and support our initial observations (Fig 2C, D) and we have edited the text to reflect this (lines 157-163). We have also removed the supplementary figures that were uncited in the text.

      TUBD1 and TUBE1 form a dimer and TEDC2 and TEDC1 can interact. Any speculation as to why TEDC2 does not pull down both TUBE1 and TUBD1? 

      We apologize for the confusion. TEDC2 does pull down both TUBE1 and TUBD1 (Fig 3D, pull-down, second column, Tedc2-V5-APEX2 rescuing the Tedc2<sup>-/-</sup> cells pulls down TUBD1, TUBE1, and TEDC1).  

      Figure 4A and B. The authors use acetylated tubulin to determine the length of procentrioles in the S and G2 phases. However, procentrioles are not acetylated on their distal ends in these cell phase phases (as the authors also mention further in the text). Why has alpha tubulin not been used since it works well in U-ExM? The average size of the control, G2 procentrioles, seems too small in Figure 4A and not consistent with other imaging data (for instance, in Figure 4 - Supplement 1 C, Cp110, and CPAP staining). There is no statistical analysis in F4A.  

      We have added two supplementary data figures (Fig 4 – supp 3 and Fig 4 – supp 4) in which we co-stain control and mutant centrioles with alpha-tubulin. We found that acetylated tubulin correlates well with overall tubulin signal in all mutants. We have added statistical analysis to the figure legend of Fig 4A.

      Lines 260 - 262: "These results indicate that centrioles with singlet microtubules can elongate to the same length as controls, and therefore that triplet microtubules are not essential for regulating centriole length." It is hard to agree with this statement. Mutant procentrioles show aberrantly elongated proximal signals of several tested proteins. In addition, in lines 326 - 328, the authors state that "Together, these results indicate that centrioles lacking compound microtubules are unable to properly regulate the length of the proximal end."  

      We thank the reviewer and have clarified the statement to state that these results indicate that centrioles with singlet microtubules can elongate to the same overall length as control centrioles in G2 phase.  

      Line 353: The authors suggest that elongated procentriole structure in mitosis may represent intermediates in centriole disassembly. Another interpretation, more in line with the EM data from Wang et al., 2017, would be that these mutant procentrioles first additionally elongate before they disassemble in late mitosis. The aberrant intermediate structure concept would need further exploration. For instance, anti-alpha/beta-tubulin antibodies could be used to investigate centriole microtubules.  

      We apologize for the confusion and have edited this section for clarity (lines 341-343): “We conclude that in our mutant cells, centrioles elongate in early mitosis to form an aberrant intermediate structure, followed by fragmentation in late mitosis.”

      References need to be included in lines 122, 277, 279. 

      We have added these references

      Line 281: Add references PMID: 30559430 and PMID: 32526902.  

      We have added these references (lines 265-266).

      Line 289: "Moreover, our results suggest that centriole glutamylation is a multistep process, in which long glutamate side chains are added later during centriole maturation." This does not seem like an original observation. For instance, see PMID: 32526902.  

      We have added this reference (lines 273-274).

    1. eLife Assessment

      This manuscript describes an important finding of the transcriptional control of a chimeric gene transfer agents (GTA) cluster in Bartonella by a processive anti-termination factor (BrrG). The evidence provided is solid. This manuscript will interest researchers working on transcriptional regulation, horizontal gene transfer, and phages.

    2. Reviewer #1 (Public review):

      Summary:

      Gene transfer agent (GTA) from Bartonella is a fascinating chimeric GTA that evolved from the domestication of two phages. Not much is known about how the expression of the BaGTA is regulated. In this manuscript, Korotaev et al noted the structural similarity between BrrG (a protein encoded by the ror locus of BaGTA) to a well-known transcriptional anti-termination factor, 21Q, from phage P21. This sparked the investigation into the possibility that BaGTA cluster is also regulated by anti-termination. Using a suite of cell biology, genetics, and genome-wide techniques (ChIP-seq), Korotaev et al convincingly showed that this is most likely the case. The findings offer the first insight into the regulation of GTA cluster (and GTA-mediated gene transfer) particularly in this pathogen Bartonella. Note that anti-termination is a well-known/studied mechanism of transcriptional control. Anti-termination is a very common mechanism for gene expression control of prophages, phages, bacterial gene clusters, and other GTAs, so in this sense, the impact of the findings in this study here is limited to Bartonella.

      Strengths:

      Convincing results that overall support the main claim of the manuscript.

      Weaknesses:

      A few important controls are missing.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors identified and characterized a regulatory mechanism based on transcriptional anti-termination that connects the two gene clusters, capsid and run-off replication (ROR) locus, of the bipartite Bartonella gene transfer agent (GTA). Among genes essential for GTA functionality identified in a previous transposon sequencing project, they found a potential antiterminatior of phage origin within the ROR locus. They employed fluorescence reporter and gene transfer assays of overexpression and knockout strains in combination with ChiPSeq and promoter-fusions to convincingly show that this protein indeed acts as an antiterminator counteracting attenuation of the capsid gene cluster expression.

      Impact on the field:

      The results provide valuable insights into the evolution of the chimeric BaGTA, a unique example of phage co-domestication by bacteria. A similar system found in the other broadly studied Rhodobacterales/Caulobacterales GTA family suggests that antitermination could be a general mechanism for GTA control.

      Strengths:

      Results of the selected and carefully designed experiments support the main conclusions.

      Weaknesses:

      It remains open why overexpression of the antiterminator does not increase the gene transfer frequency.

    4. Author response:

      Reviewer 1:

      (1) Provide Rsmd and DALI scores to show how similar the AlphaFold-predicted structures of BrrG are to other anti-termination factors. This should be done for Fig1B and also for Suppl. Fig 1 to support the claim that BrrG, GafA, GafZ, Q21 share structural features.

      In the revised manuscript we will provide Rsmd and DALI scores.

      (2) Throughout the manuscript, flow cytometry data of gfp expression was used and shown as single replicate. Korotaev et al wrote in the legends that error bars are shown (that is not true for e.g. Figs. 3, 4, and 5). It is difficult for reviewers/readers to gauge how reliable are their experiments.

      As stated in the manuscript all flow cytometry data were performed in triplicate. In the revised manuscript we will include the two replicates not presented in the main figures as supplementary information.

      (3) I am unsure how ChIP-seq in Fig. 2A was performed (with anti-FLAG or anti-HA antibodies? I cannot tell from the Materials & Methods). More importantly, I did not see the control for this ChIP-seq experiment. If a FLAG-tagged BrrG was used for ChIP-seq, then a WT non-tagged version should be used as a negative control (not sequencing INPUT DNA), this is especially important for anti-terminator that can co-travel with RNA polymerase. Please also report the number of replicates for ChIP-seq experiments.

      Fig. 2A presents a coverage plot from the ChIP-Seq of ∆brrG +pTet:brrG-3xFLAG (N’). A replicate of this N-terminally tagged construct will be added as supplementary data in the revised version. As anticipated by the referee, we had used ∆brrG +pTet:brrG (untagged) as control.

      (4) Korotaev et al mentioned that BrrG binds to DNA (as well as to RNA polymerase). With the availability of existing ChIP-seq data, the authors should be able to locate the DNA-binding element of BrrG, this additional information will be useful to the community.

      We will mine the ChIP-Seq data to define the BrrG binding site as closely as possible and include the analysis in the revised version of the manuscript.

      (5) Mutational experiments to break the potential hairpin structure are required to strengthen the claim that this putative hairpin is the potential transcriptional terminator.

      We did not claim that the identified hairpin is a terminator but rather suggested it as a candidate terminator. We agree with the referee that the proposed experiment would be necessary to definitively prove its terminator function. However, our primary aim was to demonstrate that BrrG acts as a processive terminator, which we have shown by replacing the putative terminator with a well-characterized synthetic terminator that BrrG successfully overcame. Therefore, we prefer not to conduct the proposed experiment and will instead further tone down our conclusions regarding the putative terminator function of the identified hairpin structure.

      Reviewer 2:

      (1) The authors wrote "GTAs are not self-transmitting because the DNA packaging capacity of a GTA particle is too small to package the entire gene cluster encoding it" (page 3). I thought that at least the Bartonella capsid gene cluster should be self-transmissible within the 14 kb packaged DNA (https://doi.org/10.1371/journal.pgen.1003393, https://doi.org/10.1371/journal.pgen.1000546). This was also concluded by Lang et al (https://doi.org/10.1146/annurev-virology-101416-041624). In this case the presented results would have important implications. As the gene cluster and the anti-terminator required for its expression are separated on the chromosome, it would not be possible to transfer an active GTA gene cluster, although the DNA coding for the genes required for making the packaging agent itself, theoretically fits into a BaGTA particle. Could the authors comment on that? I think it would be helpful to add the sizes of the different gene clusters and the distance between them in Fig. 2A. The ROR amplified region spans 500kb, is the capsid gene cluster within this region?

      We thank the reviewer for bringing up this interesting point. The bgt cluster (capsid cluster) is approximately 9.2 kb in size and could feasibly be packaged in its entirety into a GTA particle. In contrast, the ror gene cluster, which encodes the antiterminator BrrG, is approximately 20 kb in size—exceeding the packaging limit of GTA particles—and is separated from the bgt cluster by approximately 35 kb. Consequently, if the bgt cluster is transferred via a GTA particle into a recipient host that does not encode the ror gene cluster, the bgt cluster would not be expressed.

      (2) Another side-note regarding the introduction: On page three the authors write: "GTAs encode bacteriophage-like particles and in contrast to phages transfer random pieces of host bacterial DNA". While packaging is not specific, certain biases in the packaging frequency are observed in both studied GTA families. For Bartonella this is ROR. In the two GTA-producing strains D. shibae and C. crescentus origin and terminus of replication are not packaged and certain regions are overrepresented (https://doi.org/10.1093/gbe/evy005, https://doi.org/10.1371/journal.pbio.3001790). Furthermore, D. shibae plasmids are not packaged but chromids are. I think the term "random" does not properly describe these observations. I would suggest using "not specific" instead.

      We thank the reviewer for this suggestion and will adjust the working accordingly.

      (3) Page 5: Remove "To address this". It is not needed as you already state "To test this hypothesis" in the previous sentence.

      We will adjust the working accordingly.

      (4) I think the manuscript would greatly benefit from a summary figure to visualize the Q-like antiterminator-dependent regulatory circuit for GTA control and its four components described on pages 15 and 16.

      We thank the reviewer for this valuable suggestion and will include a summary figure illustrating the deduced regulatory mechanism in the revised manuscript.

      (5) Page 17: It might be worth noting that GafA is highly conserved along GTAs in Rhodobacterales (https://doi.org/10.3389/fmicb.2021.662907) and so is probably regulatory integration into the ctrA network (https://doi.org/10.3389/fmicb.2019.00803). It's an old mechanism. It would be also interesting to know if it is a common feature of the two archetypical GTAs that the regulator is not part of the cluster itself.

      We agree with the points raised by the reviewer and will address them in the revised manuscript. Specifically, we will highlight the high conservation of GafA in GTAs across Rhodobacterales and its regulatory integration within the ctrA network. Additionally, we will analyze whether the GafA-like antitermination regulator is typically located outside the regulated gene cluster, as we have demonstrated for BrrG of BaGTA in the Bartonellae.

    1. eLife Assessment

      This valuable study presents findings on DNA methylation as an efficient epigenetic transcriptional regulating strategy in bacteria. The authors utilized single-molecule real-time sequencing to profile the DNA methylation landscape across three model pathovars of Pseudomonas syringae, identifying significant epigenetic mechanisms through the Type-I restriction-modification system, which includes a conserved sequence motif associated with N6-methyladenine. The evidence presented is solid and the study provides novel insights into the epigenetic mechanisms of P. syringae, expanding the understanding of bacterial pathogenicity and adaptation.

    2. Reviewer #1 (Public review):

      Summary:

      In this work, Huang et al used SMRT sequencing to identify methylated nucleotides (6mA, 4mC, and 5mC) in Pseudomonas syringae genome. They show that the most abundant modification is 6mA and they identify the enzymes required for this modification as when they mutate HsdMSR they observe a decrease of 6mA. Interestingly, the mutant also displays phenotypes of change in pathogenicity, biofilm formation, and translation activity due to a change in gene expression likely linked to the loss of 6mA.

      Overall, the paper represents an interesting set of new data that can bring forward the field of DNA modification in bacteria.

      Comments on revisions:

      Thank you for the additional work. The authors have now addressed all my concerns.

    3. Reviewer #2 (Public review):

      In the present manuscript, Huang et.al. revealed the significant roles of the DNA methylome in regulating virulence and metabolism within Pseudomonas syringae, with a particular focus on the HsdMSR system in this model strain. The authors used SMRT-seq to profile the DNA methylation patterns (6mA, 5mC, and 4mC) in three P. syringae strains (Psph, Pss, and Psa) and displayed the conservation among them. They further identified the type I restriction-modification system (HsdMSR) in P. syringae, including its specific motif sequence. The HsdMAR participated in the process of metabolism and virulence (T3SS & Biofilm formation), as demonstrated through RNA-seq analyses. Additionally, the authors revealed the mechanisms of the transcriptional regulation by 6mA. Strictly from the point of view of the interest of the question and the work carried out, this is a worthy and timely study that uses third-generation sequencing technology to characterize the DNA methylation in P. syringae. The experimental approaches were solid, and the results obtained were interesting and provided new information on how epigenetics influences the transcription in P. syringae. The conclusions of this paper are mostly well supported by data.

      Comments on revisions:

      The authors have successfully addressed all the comments from the reviewers in their revised manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, Huang et al used SMRT sequencing to identify methylated nucleotides (6mA, 4mC, and 5mC) in Pseudomonas syringae genome. They show that the most abundant modification is 6mA and they identify the enzymes required for this modification as when they mutate HsdMSR they observe a decrease of 6mA. Interestingly, the mutant also displays phenotypes of change in pathogenicity, biofilm formation, and translation activity due to a change in gene expression likely linked to the loss of 6mA. Overall, the paper represents an interesting set of new data that can bring forward the field of DNA modification in bacteria.

      Thank you for your valuable feedback on our paper exploring the impact of 6mA modification in P. syringae.

      Major Concerns:

      Most of the authors' data concern Psph pathovar. I am not sure that the authors' conclusions are supported by the two other pathovars they used in the initial 2 figures. If the authors want to broaden their conclusions to Pseudomonas syringe and not restrict it to Psph, the authors should have stronger methylation data using replicates. Additionally, they should discuss why Pss is so different than Pst and Psph. Could they do a blot to confirm it is really the case and not a sequencing artifact? Is the change of methylation during bacterial growth conserved between the pathovar? The authors should obtain mutants in the other pathovar to see if they have the same phenotype. The authors have a nice set of data concerning Psph but the broadening of the results to other pathovar requires further investigation.

      We appreciate the reviewer’s insightful comments. While the majority of our data focuses on the Psph, we recognize the importance of validating these findings in Pss and Pst. To this end, we have performed additional experiments using dot blot and mutant construction to enhance our conclusions in other pathovars.

      We agree that we should discuss why Pss is different from Psph and Pst. We performed a dot blot assay using genome DNA in Pss and Pst, presented in Figure S5A. Meanwhile, we compared the 6mA modification level of Pss and Pst in different growth phases. As shown in Figure S5A, the change of methylation during bacterial growth is conserved in Pst. The change was not obvious in Pss, which might be due to the lack of a type I R-M system.

      “In accordance with previous studies showing that growth conditions affect the bacterial methylation status, we applied dot blot experiments using the same amount of DNA (1 μg) from these three P. syringae strains to detect the 6mA levels during both logarithmic and stationary phases. The results revealed that 6mA levels in the stationary phase were much higher compared to the logarithmic phase in Psph and Pst, but no significant change in Pss. Additionally, we found that during the stationary phase, 6mA methylation levels in Psph and Pst were higher than those in Pss. These findings were consistent with the MTases predication on these three strains, since Pss does not harbor any type I R-M systems, which are important for 6mA medication in bacteria.”

      Please see Figure S5A and Lines 220-228 in the revised manuscript.

      We also tried to construct an HsdM mutant in Pst to explore whether the influence of 6mA methylation was conserved in P. syringae, but it failed after multiple attempts. We did not construct a Pss mutant because no type I R-M system was predicted, and few methylation sites were identified via SMRT-seq in this strain. Therefore, we overexpressed HsdM in Pst instead. We have performed additional experiments in WT and the HsdM overexpression strains, including dot blot and growth curve assays.

      Please see Figures S5B-C and Lines228-232 in the revised manuscript.

      The authors should include proper statistical analysis of their data. A lot of terms are descriptive but not supported by a deeper analysis to sustain the conclusions. For example, in Figure 4E, we do not know if the overlap is significant or not. Are DEGs more overlapping to 6mA sites than non-DEGs? Here is a non-exhaustive list of terms that need to be supported by statistics: different level (L145), greater conservation (L162), significant conservation (L165), considerable similarity (L175), credible motifs (L189), Less strong (L277) and several "lower" and "higher" throughout the text.

      Thank you for the insightful feedback. We have made the following revisions in the manuscript to ensure that the terms are more precise and do not require statistical significance testing.

      (1) Statistical analysis: We have added statistical tests for the overlap between DEGs and 6mA sites in Figure 4E. We performed the Fisher test, and we found the overlap was not significant (p> 0.05). DEGs and non-DEGs were both non-significant overlapped 6mA sites. Please see Figure 4E and Lines 261-262.

      “Less strong” was used to indicate the influence of HsdM on biofilm in Figure 5D. All Figures with “*” labels were analyzed using students' two-tailed t-tests with a significant change (p < 0.05).

      (2) Revised wording: For terms used to describe comparisons, we have revised the wording to be clearer and ensure that the terminology used did not imply the need for statistical significance testing when not required. For example:

      “Different level” has been removed. Please see Lines 143-144.

      “Greater conservation” has been revised to “more conserved functional terms”. Please see Lines 161-162.

      “Significant conservation” has been revised to “notable conservation”. Please see Line 165.

      “Credible motifs” has been revised to “identified motifs”. Please see Line 186.

      The authors performed SMRT sequencing of the delta hsdMSR showing a reduction of 6mA. Could they include a description of their results similar to Figures 1-2. How reduced is the 6mA level? Is it everywhere in the genome? Does it affect other methylation marks? This analysis would strengthen their conclusions.

      Yes, we agree. We have provided additional analysis and descriptions to strengthen the conclusions regarding these valuable comments. We determined three methylation sites in the HsdMSR mutant strain and compared the overlapped genes within these modification patterns. Specifically, we focused on the 6mA sites in Psph WT, HsdMSR mutant, and HsdM motif CAGCN<sub>(6)</sub>CTC. As expected, we found almost all of the reduction 6mA sites in the ΔhsdMSR were from motif CAGCN<sub>(6)</sub>CTC. We also noticed that 5mC and 4mC sites in the mutant were relatively similar to that in WT, and the slight difference might be caused by sequencing errors. Overall, we propose that HsdMSR only catalyze the 6mA located on the motif CAGCN<sub>(6)</sub>CTC, but does not affect other 6mA sites and other modification types.

      Please see Figures S4D-E and Lines 212-218 in the revised manuscript.

      In Figure 6E to conclude that methylation is required on both strands, the authors are missing the control CAGCN6CGC construct otherwise the effect could be linked to the A on the complementary strand.

      Thank you for your valuable suggestions. We have provided the control result on the complementary strand. Please see Figure 6C. The new result evidences the conclusion that 6mA methylation regulates gene transcription based on methylation on both strands.

      Please see Figure 6C and Lines 329-330 in the revised manuscript.

      Reviewer #2 (Public Review):

      In the present manuscript, Huang et.al. revealed the significant roles of the DNA methylome in regulating virulence and metabolism within Pseudomonas syringae, with a particular focus on the HsdMSR system in this model strain. The authors used SMRT-seq to profile the DNA methylation patterns (6mA, 5mC, and 4mC) in three P. syringae strains (Psph, Pss, and Psa) and displayed the conservation among them. They further identified the type I restriction-modification system (HsdMSR) in P. syringae, including its specific motif sequence. The HsdMAR participated in the process of metabolism and virulence (T3SS & Biofilm formation), as demonstrated through RNA-seq analyses. Additionally, the authors revealed the mechanisms of the transcriptional regulation by 6mA. Strictly from the point of view of the interest of the question and the work carried out, this is a worthy and timely study that uses third-generation sequencing technology to characterize the DNA methylation in P. syringae. The experimental approaches were solid, and the results obtained were interesting and provided new information on how epigenetics influences the transcription in P. syringae. The conclusions of this paper are mostly well supported by data, but some aspects of data analysis and discussion need to be clarified and extended.

      Thank you for your positive feedback and recognition of the importance of our study. We appreciate the suggestions for further clarification and extension of some aspects of data analysis and discussion. We added further analysis of the SMRT-seq result of the ΔhsdMSR and overexpressed HsdM in Pst to provide more information on conservation. We added these contents to the discussion in the revised manuscript. Please see Figure 6C and  Figure S5.

      Reviewer #3 (Public Review):

      Summary:

      The article by Huang et.al. presents an in-depth study on the role of DNA methylation in regulating virulence and metabolism in Pseudomonas syringae, a model phytopathogenic bacterium. This comprehensive research utilized single-molecule real-time (SMRT) sequencing to profile the DNA methylation landscape across three model pathovars of P. syringae, identifying significant epigenetic mechanisms through the Type-I restriction-modification system (HsdMSR), which includes a conserved sequence motif associated with N6-methyladenine (6mA). The study provides novel insights into the epigenetic mechanisms of P. syringae, expanding the understanding of bacterial pathogenicity and adaptation. The use of SMRT sequencing for methylome profiling, coupled with transcriptomic analysis and in vivo validation, establishes a robust evidence base for the findings

      Strengths:

      The results are presented clearly, with well-organized figures and tables that effectively illustrate the study's findings.

      Weaknesses:

      It would be helpful to add more details, especially in the methods, which make it easy to evaluate and enhance the manuscript's reproducibility.

      Thank you for the positive evaluation of our study, as well as the constructive feedback provided. We have added more details in methods for RNA-seq analysis and Ribo-seq analysis. Please see Lines 484-515.

      “Briefly, bacteria were cultured to an OD<sub>600</sub> of 0.4, at which point chloramphenicol was added to a final concentration of 100 µg/mL for 2 minutes. Cells were then pelleted and washed with pre-chilled lysis buffer [25 mM Tris-HCl, pH 8.0; 25 mM NH4Cl; 10 mM MgOAc; 0.8% Triton X-100; 100 U/mL RNase-free DNase I; 0.3 U/mL Superase-In; 1.55 mM chloramphenicol; and 17 mM GMPPNP]. The pellet was resuspended in lysis buffer, followed by three freeze-thaw cycles using liquid nitrogen. Sodium deoxycholate was then added to a final concentration of 0.3% before centrifugation. The resulting supernatant was adjusted to 25 A260 units and mixed with 2 mL of 500 mM CaCl<sub>2</sub> and 12 µL MNase, making up a total volume of 200 µL. After the digestion, the reaction was quenched with 2.5 mL of 500 mM EGTA. Monosomes were isolated using Sephacryl S400 MicroSpin columns, and RNA was purified using the miRNeasy Mini Kit (Qiagen). rRNA was removed using the NEBNext rRNA Depletion Kit, and the final library was constructed with the NEBNext Small RNA Library Prep Kit. For each sample, ribosome footprint reads were mapped to the Psph 1448A reference genome, and the translational efficiency was calculated by dividing the normalized Ribo-seq counts by the normalized RNA counts. Two biological replicates were performed for all experiments.”

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      I would recommend the authors limit their manuscript to Psph pathovar and include statistical analysis supporting their conclusions.

      Thank you for your suggestion.

      Minor

      • L104: "significantly" please add a p-value and explain the analysis.

      Sorry for the confusion. We have added the p-value and explained the analysis in the method section. The p-value used for SMRT-seq was the modification quality value (QV) score, which were used to call the modified bases A (QV=50) and C (QV=100). Please see Lines 452-454.

      • Figures 1B, D, F, and Figure 2A: make the Venn diagram to scale

      Yes, we have revised.

      • L110-111: missing p-value to say that the authors observe a bigger overlap in Pst than Psph as they observed more modified sites in Pst

      Sorry for the confusion. We said it had a bigger overlap in Pst because the number 17.7 was bigger than the number of 15 in Psph. To avoid misunderstanding, we revised the wording to “more genes equipped with all three modification types were detected in Pst than Psph”. Please see Lines 110-111.

      • L112: missing description of their Pss analysis (IDP, sites...)

      We have added the information for Pss in the revised manuscript.

      “Additionally, the methylome atlas of Pss revealed a lower incidence of methylation than those of Psph and Pst, particularly in terms of 6mA modifications, which were only seen in 457 significant 6mA occurrences under the same threshold (IPD > 1.5) and a total of 2,853 and 1,438 methylation sites were detected as 5mC and 4mC, respectively”. Please see Lines 114-116.

      • L118: "modification" to "modified "

      We have revised. Please see Line 119.

      • L120: "modification sites" to "modified nucleotides"

      We have revised. Please see Line 121.

      • L142: correct the title "Methylated genes revealed highly functional conservation among three P. syringae strains" maybe to "Methylated genes are functionally conserved among ..."

      We have revised. Please see Line 142.

      • Figure 2C: not easy to read and interpret

      Sorry for the confusion. Figure 2C revealed the significantly enriched functional pathways in GO and KEGG databases among three P. syringae strains. The specific names of each pathway were listed on the left, and each column with dots indicated the number of genes within one kind of methylation in one of three P. syringae strains. The larger the size, the bigger the number.

      We have revised the legend of Figure 2C. Please see Lines 575-579.

      “The dot plot revealed the significantly enriched functional pathways in GO and KEGG databases among three P. syringae strains. The specific names of each pathway were listed on the left, and each column with dots indicated the number of genes within one kind of methylation in one of three P. syringae strains. The size of the dots indicates the number of related genes.”

      • Figure 6B-C: what is the difference between B 24h and C?

      Figure 6B revealed the expression difference between WT and mutant during 24 hours. Figure 6C only showed a time point in 24 hours. To avoid repetition, we have removed Figure 6C.

      • Figure 6C-D: if the same maybe remove Figure 2C

      We have removed Figure 6D.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript could be improved by addressing the following concerns:

      (1) In line 146: How to understand the percentage conserved in "more than two of the strains"?

      Sorry for the confusion, we planned to indicate the pattern that conserved in two strains and three strains. We have revised it to: “Notable, about 25% to 45% of methylated genes were conserved in two and three strains”. Please see Line 145.

      (2) In line 178: Five conserved sequence motifs should be replaced by "Six conserved sequence motifs".

      We have revised. Please see Line 176.

      (3) In Figure 2B, specify the C1, C2 and C3. "m6A" should be replaced by "6mA".

      Yes, we have revised.

      (4) In Figure S2, "m6A" should be replaced by "6mA".

      Yes, we have revised.

      (5) In line 212, please add references for the previous studies showing that growth conditions affect bacterial methylation status.

      Thank you for your suggestion. We have added the relevant references (Gonzalez and Collier, 2013), (Krebes et al., 2014), (Sanchez-Romero and Casadesus, 2020).

      (6) In line 217, "illustrate" should be "illustrated".

      Yes, we have revised. Please see Line 210.

      (7) There are some genes colored in grey, revealing bigger differences between the two strains than those related to ribosomal protein, T3SS, and alginate synthesis in Fig. 4A. Do they have important functional roles as well?

      Thank you for your suggestion. A total of 116 genes with bigger differences (|Log<sub>2</sub>FC| > 2) except for genes related to ribosomal protein, T3SS, and alginate synthesis. Among these genes, 31 were annotated as hypothetical proteins and 4 as transcription factors with unknown functions, and the remaining genes mostly encoded metabolism-related enzymes. These enzymes might have effects on growth defects in ΔhsdMSR. We added this information in the revised manuscript. Please see Line 249-254.

      (8) The authors should discuss what will be the potential signals or factors that can regulate the activity of HsdMSR. In other words, what can decide the extent of methylation through activating or suppressing the expression of HsdMSR?

      Thank you for your valuable suggestion. We have added this part in the discussion part. Please see Lines 404-415.

      “Apart from the established roles of 6mA and HsdMSR in P. syringae, certain signals or factors may influence HsdMSR expression. For instance, we confirmed that the growth phase affects methylation levels in P. syringae. Previous studies have shown that increased temperatures can reduce methylation levels, as observed in PAO1(Doberenz et al., 2017). These findings suggest that HsdMSR expression may be responsive to both intrinsic cellular states and extrinsic environmental conditions. To further explore potential upstream TFs regulating the expression of HsdMSR, we searched for TF-binding sites in the HsdMSR promoter using our own databases (Fan et al., 2020; Shao et al., 2021; Sun et al., 2024). As a result, we found three candidate TFs (PSPPH_0061, PSPPH_3268, and PSPPH_3504) that might directly bind and regulate HsdMSR expression. Future studies on these TFs and their interactions with the HsdMSR promoter would help clarify the regulatory network governing HsdMSR activity.”

      Reviewer #3 (Recommendations For The Authors):

      (1) Some figures contain dense information, which may be overwhelming for readers. Streamlining the legend for Figure 1 and resizing the Venn diagrams within it could enhance clarity and visual appeal.

      Thank you for your suggestion. We have scaled all the Venn plots in the revised version.

      (2) Incorporating a discussion about the role of the restriction-modification (RM) system in bacterial defense against phage infection into the discussion section could enrich the manuscript's context and relevance.

      Thank you for your valuable suggestion. We have added this part in the Discussion part. Please see Lines 416-427.

      “RM systems are known for their intrinsic role as innate immune systems in anti-phage infection, and present in around 90% of bacterial genomes(Oliveira et al., 2014). RM systems protect bacteria self by recognizing and degrading foreign phage DNA via methylation-specific site and restriction endonucleases (REases) (Loenen et al., 2014). In addition, other phage-resistance systems are similar to RM systems but carry extra genes. One is called the phage growth limitation (Pgl) system, which modifies and cleaves phage DNA. However, the Pgl only modifies the phage DNA in the first infection cycle, and cleaves phage DNA in the subsequent infections, which gives a warn to the neighboring cells(Hampton et al., 2020; Hoskisson et al., 2015). To counteract RM and RM-like systems, phages have evolved strategies, including unusual modifications such as hydroxymethylation, glycosylation, and glucosylation. They can also encode their own MTases to protect their DNA or employ strategies to evade restriction systems and other anti-RM defenses.(Iida et al., 1987; Murphy et al., 2013; Vasu and Nagaraja, 2013).

      (3) In line 152: What is the importance of the mentioned example of Cro/CI family TF?

      Thank you for your comments. The Cro/CI are important TFs present in phages. The interaction between Cro and CI affects bacteria immunity status in Enterohemorrhagic Escherichia coli (EHEC) strains(Jin et al., 2022). RM systems are known as a kind of phage-defense system, and hence we mentioned it here. We have added this description in the revised manuscript. Please see Lines 152-154.

      Reference:

      (1) Doberenz, S., Eckweiler, D., Reichert, O., Jensen, V., Bunk, B., Sproer, C., Kordes, A., Frangipani, E., Luong, K., Korlach, J., et al. (2017). Identification of a Pseudomonas aeruginosa PAO1 DNA Methyltransferase, Its Targets, and Physiological Roles. mBio 8. 10.1128/mBio.02312-16.

      (2) Fan, L., Wang, T., Hua, C., Sun, W., Li, X., Grunwald, L., Liu, J., Wu, N., Shao, X., Yin, Y., et al. (2020). A compendium of DNA-binding specificities of transcription factors in Pseudomonas syringae. Nat Commun 11, 4947. 10.1038/s41467-020-18744-7.

      (3) Gonzalez, D., and Collier, J. (2013). DNA methylation by CcrM activates the transcription of two genes required for the division of Caulobacter crescentus. Mol Microbiol 88, 203-218. 10.1111/mmi.12180.

      (4) Hampton, H.G., Watson, B.N., and Fineran, P.C. (2020). The arms race between bacteria and their phage foes. Nature 577, 327-336.

      (5) Hoskisson, P.A., Sumby, P., and Smith, M.C. (2015). The phage growth limitation system in Streptomyces coelicolor A (3) 2 is a toxin/antitoxin system, comprising enzymes with DNA methyltransferase, protein kinase and ATPase activity. Virology 477, 100-109.

      (6) Iida, S., Streiff, M.B., Bickle, T.A., and Arber, W. (1987). Two DNA antirestriction systems of bacteriophage P1, darA, and darB: characterization of darA− phages. Virology 157, 156-166.

      (7) Jin, M., Chen, J., Zhao, X., Hu, G., Wang, H., Liu, Z., and Chen, W.-H. (2022). An engineered λ phage enables enhanced and strain-specific killing of enterohemorrhagic Escherichia coli. Microbiology Spectrum 10, e01271-01222.

      (8) Krebes, J., Morgan, R.D., Bunk, B., Sproer, C., Luong, K., Parusel, R., Anton, B.P., Konig, C., Josenhans, C., Overmann, J., et al. (2014). The complex methylome of the human gastric pathogen Helicobacter pylori. Nucleic Acids Res 42, 2415-2432. 10.1093/nar/gkt1201.

      (9) Loenen, W.A., Dryden, D.T., Raleigh, E.A., Wilson, G.G., and Murray, N.E. (2014). Highlights of the DNA cutters: a short history of the restriction enzymes. Nucleic Acids Res 42, 3-19.

      (10) Murphy, J., Mahony, J., Ainsworth, S., Nauta, A., and van Sinderen, D. (2013). Bacteriophage orphan DNA methyltransferases: insights from their bacterial origin, function, and occurrence. Appl Environ Microb 79, 7547-7555.

      (11) Oliveira, P.H., Touchon, M., and Rocha, E.P. (2014). The interplay of restriction-modification systems with mobile genetic elements and their prokaryotic hosts. Nucleic Acids Res 42, 10618-10631.

      (12) Sanchez-Romero, M.A., and Casadesus, J. (2020). The bacterial epigenome. Nature reviews. Microbiology 18, 7-20. 10.1038/s41579-019-0286-2.

      (13) Shao, X., Tan, M., Xie, Y., Yao, C., Wang, T., Huang, H., Zhang, Y., Ding, Y., Liu, J., Han, L., et al. (2021). Integrated regulatory network in Pseudomonas syringae reveals dynamics of virulence. Cell Rep 34, 108920. 10.1016/j.celrep.2021.108920.

      (14) Sun, Y., Li, J., Huang, J., Li, S., Li, Y., Lu, B., and Deng, X. (2024). Architecture of genome-wide transcriptional regulatory network reveals dynamic functions and evolutionary trajectories in Pseudomonas syringae. bioRxiv, 2024.2001. 2018.576191.

      (15) Vasu, K., and Nagaraja, V. (2013). Diverse functions of restriction-modification systems in addition to cellular defense. Microbiol Mol Biol Rev 77, 53-72. 10.1128/MMBR.00044-12.

    1. eLife Assessment

      This manuscript offers valuable theoretical predictions on how horizontal gene transfer (HGT) can lead to alternative stable states in microbial communities. Using a modeling framework, solid theoretical evidence is provided to support the claimed role of HGT. However, given that the model has many degrees of freedom, a more comprehensive analysis of the role of different parameters could strengthen the study. Additionally, potential interactions between plasmids that carry out HGT are not discussed in the model. This paper would be of interest to researchers in microbiology, ecology, and evolutionary biology.

    2. Reviewer #2 (Public review):

      Summary:

      In this work, the authors use a theoretical model to study the potential impact of Horizontal Gene Transfer on the number of alternative stable states of microbial communities. For this, they use a modified version of the competitive Lotka Volterra model-which accounts for the effects of pairwise, competitive interactions on species growth-that incorporates terms for the effects of both an added death (dilution) rate acting on all species and the rates of horizontal transfer of mobile genetic elements-which can, in turn, affect species growth rates. The authors analyze the impact of horizontal gene transfer in different scenarios--such as bistability between pairs of species and multistability in communities--over an extended range of parameter values. In almost all these cases, the authors report an increase in either the number of alternative stable states or the parameter region (e.g. growth rate values) in which they occur.

      Understanding the origin of alternative stable states in microbial communities and how often they may occur is an important challenge in microbial ecology and evolution. Shifts between these alternative stable states can drive transitions between e.g. a healthy microbiome and dysbiosis. A better understanding of how horizontal gene transfer can drive multistability could help predict alternative stable states in microbial communities, as well as inspire novel treatments to steer communities towards the most desired (e.g. healthy) stable states. In my opinion, this manuscript is a solid theoretical approach to the subject.

      Strengths:<br /> - Generality of the model: the work is based on a phenomenological model that has been extensively used to predict the dynamics of ecological communities in many different scenarios.<br /> - The question of how horizontal gene transfer can drive alternative stable states in microbial communities is important and there are very few studies addressing it.

      Weaknesses:<br /> - In the revised version of the manuscript, the authors significantly extended the analyzed region of parameter values. Still, the model has many parameters and the analysis is typically done by changing one or two parameters at a time. Thus, the work shows how HGT can indeed promote multistability, but it remains hard to grasp whether it consistently does so across a large region of the parameter values space.

    3. Reviewer #3 (Public review):

      Hong et al. used a model they previously developed to study the impact of plasmid transfer on microbial multispecies communities. They investigated the effect of plasmid transfer on the existence of alternative stable states in a community. The model most closely resembles plasmid conjugation, where the transferred genes confer independent growth-related fitness effects and different plasmids do not affect each other's transfer or growth effects. For this process, the authors find that increasing the rate of plasmid transfer leads to an increasing number of stable states, as long as the model includes a constant death/dilution term.

      This is an interesting and important topic, and I welcome the authors' efforts to explore these topics with mathematical modeling. The addition of sensitivity analyses also strengthens the usefulness for quantitative microbial ecologists. However, the additional sections have made the main text harder to read. Between the effect of the dilution rate, the increase in subpopulations with HGT, and the modulation of interspecies competition, the reviewers have suggested a number of factors that may explain the way plasmid transfer modulates multistability. I think it would be helpful if the authors could summarize some of these effects/interactions between different parameters in their model more. I personally continue to find the model very unintuitive, especially in the way it averages over subpopulations carrying more than one foreign plasmid. Additional sentences that give the reader intuition for the sensitivity analyses and how these interplay with the results would be good.

      Specific points

      (1) The model makes strong assumptions about the biology of HGT, that could be spelled out even more. Since the model is primarily applicable to HGT driven by the exchange of plasmids, I believe the abstract (and perhaps even the title of the paper) should be updated to reflect that.

      (2) I am not surprised that a mechanism that creates diversity will lead to more alternative stable states. Specifically, the null model for the absence of HGT is to set gamma to zero, resulting in pij=0 for all subpopulations (line 454). This means that a model with N^2 classes is effectively reduced to N classes. It seems intuitive that an LV-model with many more species would also allow for more alternative stable states. For a fair comparison one would really want to initialize these subpopulations in the model (with the same growth rates - e.g. mu1(1+lambda2)) but without gene mobility.<br /> [Update:] It is good that it seems that initializing pij with non-zero abundance did not seem to affect the conclusion that higher amounts of HGT increases multi stability. However, rather than listing it as one control for a specific condition, I would argue that this is the appropriate null model across the board (where HGT rate is varied from 0 to a high value), including figures S9 and S10.

      (3) The possibility that the same cell may be counted in different pij runs counter to all intuition that researchers coming from a background of compartmental /epidemiological modeling may have. The associated assumption that plasmids do not affect each other's dynamics or (growth/interaction) effects at all is also a very strong assumption. This should be signaled much earlier in the manuscript, possibly already in line 106 when the model is introduced.

    1. eLife Assessment

      This important study combines virology experiments and mathematical modeling to determine the nuclear export rate of each of the eight RNA segments of the influenza A virus, leading to the proposal that a specific retention of mRNA within the nucleus delays the expression of antigenic viral proteins. The proposed model for explaining the differential rate of export is compelling, going beyond the state of the art, but the experimental setup is only in partial support and further studies will be needed to confirm the proposed mechanism.

    2. Reviewer #1 (Public review):

      The authors studied why the two more antigenic proteins of the influenza A virus, hemagglutinin (HA) and neuraminidase (NA), are expressed later during the infection. They set an experimental approach consisting of a 2-hour-long infection at a multiplicity of infection of 2 with the viral strain WSN. They used cells from the lung carcinoma cell line A549. They used the FISH technique to detect the mRNAs in situ and developed an imaging-based assay for mathematically modeling and estimating the nuclear export rate of each of the eight viral segments. They propose that the delay in the expression of HA and NA is based on the retention of their mRNA within the nucleus.

      Strength

      The study of an unaddressed mechanism in influenza A virus infectious cycle, as is the late expression of HA and NA, by creating a work flow including mRNA detection (FISH) plus mathematical calculations to arrive at a model, which additionally could be useful for general biological processes where transcription occurs in a burst-like manner.

      Weakness

      The authors built on several assumptions regarding the viral infection to "quantify" the transcript' export rate lacking experimental support. It would greatly improve if more precise experiments could be performed and/or include demonstration of the assumptions made (i.e., empirically demonstrating that cRNA production does not occur within the first 2 hours of infection, and the late expression of HA and NA proteins).

    3. Reviewer #2 (Public review):

      In this study the authors developed a framework to investigate the export rates of Influenza viral RNAs translocating from the nucleus to the cytoplasm. This model suggests that the influenza virus may control gene expression at the RNA export level, namely, the retention of certain transcripts in the nucleus for longer times, allows the generation of other viral encoded proteins that are exported regularly, and only later on do certain mRNAs get exported. These encode proteins that alert the cell to the presence of viral molecules, hence keeping their emergence to very end, might help the virus to avoid detection as late as possible in the infection cycle.

      The study is of limited scope. The notion that some mRNAs are retained in the nucleus after transcription is concluded early on from the FISH data. The model does not contribute much to the understanding and is mostly confirming the FISH data. The export rate is an ambiguous number and this part is not elaborated upon. One is left with more questions since no mechanistic knowledge emerges, and no additional experimentation is attempted to try drive to a deeper understanding.

      Comments on revisions:

      The authors have implemented the comments that required textual rewriting, which does make the paper clearer. On the experimental side, very little was done. It is fine to answer that the suggested experiments are not relevant or feasible for one reason or another, but one would expect to see some effort in providing other experimental sets to address key comments, and not only to modify a sentence in the text. So in my mind this round of revision feels more like some kind of intellectual discussion, which is fine, but I would have expected more, particularly after so much time has passed. I am still not satisfied with the way the analysis is presented in Fig. 2B, and writing a line about what is not analyzed in the legend, does not seem clear enough.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editors and reviewers for the comments and suggestions on our manuscript.  The main point that we wished to convey in this paper was the concept and the kinetic model that enabled the estimation of nuclear export rate from an image of single mRNAs localised in single cells.  By studying the influenza viral transcripts with this model, we report the variation in the mRNA nuclear export rate of the eight viral segments.  Of note, the hemagglutinin and neuraminidase mRNAs were the slowest among the eight segments in exiting the nucleus.  We agree that the potential mechanism and the biological impact of this observation require further validation, as the reviewers pointed out.  We revised our manuscript to describe these points separately (Lines 21-25, Abstract; Lines 86-91, Introduction; Lines 316-320, Results; Lines 372-381, Discussion).  We also highlight below, the revisions that we made to address the specific points raised by the reviewers.  

      Influenza viral transcription

      The authors used specific settings for their virology experiments and several assumptions regarding their mathematical modelling, so it's extremely important that the reader has the viral life cycle clearly understood before immersing themselves in the results. Thus, a detailed explanation of the viral life cycle, including the kinetics of each step, would be extremely helpful if included in the introduction section.  Reviewer #1

      We have included the molecular composition of influenza vRNP and the mechanism of viral transcription in the revised manuscript (Lines 46-53).  

      Line 45: "Eight viral RNA segments are transcribed by the same set of molecular machinery" (Ref. 7). What's known about the arrival of the viral RNA segments in the nucleus? Is it synchronized? The authors will understand that my concern is related to the fact that a differential arrival would indeed impact the transcription and export processes.  Reviewer #1

      The arrival of eight vRNPs in the nucleus is not synchronised, with each of the eight vRNPs arriving independently (Chou et al. PLOS Pathogens 2013) (Lakadamyali et al, PNAS 2003).  This does not compromise our model, as our model estimates the export rate of each mRNA species individually (also please see our response in Model assumption below).  This is included in the second paragraph of the Discussion section (Lines 390-400).  

      Model assumption

      Even though I do not have the expertise to assess the authors' mathematical model, I do not doubt its robustness. Even so, I find some virological concerns related to the set-up of their experiments. According to what I understand, the authors performed non-synchronized 2 h-long infections with the WSN strain of influenza A virus. They did this to avoid cRNA production (and cross-reaction of the probes), which they claim to occur "much later than mRNA synthesis". Then they omit the degradation of the mRNAs for their model without giving an explanation for having done so. So, taking all these into account, it seems to me that too many assumptions are made without a strong argument. I understand that they are made in order to simplify their model, but I strongly consider that the model would gain strength if some of these events were experimentally considered. Thus, would it be possible to perform synchronized infections? Would it be possible to empirically demonstrate that cRNA production does not occur within the first 2 hours of infection and/or separate transcription and replication? Would it be possible to incorporate a degradation inhibitor of the mRNAs into their infections? If all these could be achieved, then the results coming out of the mathematical model would be enormously reinforced.  Reviewer #1

      * The study lacks experimental data that would help support the conclusions. For instance, perturbations are many times used to prove a point related to gene expression. An example for Fig. 2 for such an experiment could be to treat the cells with transcription inhibitors (e.g. DRB, 5,6-dichloro1-beta-D-ribofuranosylbenzimidazole). Preventing transcription leaves only mature RNAs in the nucleus, and then using this system one can compare the export rate of different RNAs.  Reviewer #2

      We agreed that the primary concern in our model was the assumption that the mRNA degradation could be omitted.  Synchronised infection is not necessary; in fact, non-synchronised infection is preferred, as we explain later in our response.  Additionally, the dominance of mRNA production over the cRNA production has been documented elsewhere.  To address mRNA degradation and validate our model estimation, we performed a time-course measurement using baloxavir.  Baloxavir efficiently blocks the viral transcription by inhibiting the nuclease activity in PA.  DRB, suggested by the reviewer, allows influenza viral transcription and causes viral transcripts to accumulate in the nucleus for unknown mechanisms (Amorim et al. Traffic 2007 and our observation using smFISH, not shown).  The additional experiment, now presented in Fig. 5 in the revised manuscript, indicated that the mRNA degradation is minimal, and the export rate estimated in our model and the time-course experiment agreed well for the HA segment.  The experiment raised the possibility that the time-course measurement underestimates the export rate of transcripts that exit the nucleus rapidly, such as NP.  A real-time imaging of single transcripts would be necessary to directly measure the true nuclear export rate; however, this is beyond the scope of our paper.  The new result is now presented in Fig. 5, Supplementary figures 3 and 4, and in the main text (Lines 322-360).  An alteration was also made in Line 286 to guide to Fig. 5.  The Materials and Methods section was updated (Lines 478-482).  

      We note that our model does not require synchronised infection.  Even under synchronised infection, such as incubating cells with the virus at 4°C to facilitate attachment and subsequently shifting to 37°C to allow viral entry, the inherent heterogeneity in vRNP migration to the nucleus still remains.  This randomness does not compromise our model; rather, our model exploits this random arrival of each vRNP in each cell in the system.  This variation, in turn, generates cells carrying varying amounts of transcripts, enabling the estimation of nuclear export rate.  Importantly, more variation ensures the broader distribution of transcript levels, enabling more precise parameter fitting in our model.  It is also important to note that our model does not require the correlation between segments.  Our model estimates the export rate of each mRNA species individually.  These important points were explained in the Discussion section (Lines 390-400).  

      * There is no concrete value given for the export rates and what they might mean biologically (e.g. time present/stuck in the nucleus) - Fig. 4D. This leaves the reader in the dark.  Reviewer #2

      The export rate lambda (previously denoted as k) in our model (Fig. 4) and the decay constant k in the time-course measurement (Fig. 5) represent the proportion of mRNAs exported from the nucleus in an infinitesimal time, defining the nuclear export rate.  This has been clarified in the revised manuscript (Lines 314-316), with some alterations to make the parameter use more comprehensive.  

      -  The Greek letter k previously used in Fig. 4 and the associated equations was consistently replaced with lambda to avoid the confusion with the parameter k that is subsequently used for the exponent decay in Fig. 5 in the revised manuscript.  

      -  The Greek letter epsilon (previously used to represent export) was replaced with mu, slightly more common for representing the rate of transport.  

      -  The term “velocity” was consistently replaced with “rate” in the context of the nuclear export (Lines 163, 215, 320, 441).  

      -  The phrase “molar concentrations of mRNAs” was corrected for “molecules of mRNAs” (Line 282).

      Also, we have now described our model in two sections: “Conceiving the model” and “Implementing a kinetic model to estimate the nuclear export rate” in the Result.  The first section outlines the conceptual framework of the model, and the second focuses on its implementation and the parameter extraction (Lines 227 and 277).  

      Applicability of the model

      Lines 27-29. "Our framework presented in this study can be widely used for investigating the nuclear retention of nascent transcripts produced in a transcription burst." In my opinion, this is the strongest point of the manuscript: developing a mathematical model to analyze nuclear export retention as a mechanism of protein expression control, which could lay the foundation for further biological processes. The authors revisit this idea in the Discussion section. However, which would be those processes for which the model could be helpful? I consider that a more conspicuous discussion on this topic would broaden the readers scope, a crucial point under the eLife scope.  Reviewer #1

      * Could this framework be used to quantify the nuclear export rate of cellular RNAs? According to the explanation in the Discussion, it would seem that this approach is limited to quantifying the export rate of influenza RNAs.  Reviewer #2

      Our model is not limited to the influenza virus infection.  Our model is applicable for systems where transcription is initiated concurrently, such as when stimuli trigger the activation of a certain set of genes for transcription.  Therefore, this makes it particularly valuable for quantifying the nuclear retention of mRNAs in a transcription burst.  This point is reiterated in Line 383-390.  

      Potential mechanisms for differential nuclear export rate of viral segments

      * There is no mechanistic insight in the study. The idea driven by this study is that gene expression is regulated by the RNA export rate. But how is that explained? Is there any molecular pathway or explanation for this model? If the transcripts are ready for export, why do the mRNAs stay inside the nucleus? One option to consider are the export factors. Viral RNAs are exported by different pathways as mentioned (line 362), or by TREX2 (Bhat P et al Nat Comm 2023). The data shows that there is no difference observed in the export rate of different pathways. How about knocking down an important export factor to show how this affects the export rates. Or the opposite, overexpress a certain factor, would this change the nucleus/cytoplasm distribution of the retained RNAs.  Reviewer #2

      As we discussed in the paper, we are beginning to consider that each viral segment has an intrinsic sequence that determines its nuclear export rate, because previous studies on the export factors does not fully explain the variation in the nuclear export rate observed in our study.  As the reviewer suggested, a recent study (Bhat et al. Nature Communications 2023) exactly pointed out the internal sequence in the HA segment, aligning with our working hypothesis.  This point is discussed and their work (Bhat et al. 2023) has been cited in the Discussion section in the revised manuscript (Lines 446-449).  

      Biological impact of the nuclear retention

      The authors mention several times throughout the manuscript that the virus might use the nuclear retention of mRNA for HA and NA to postpone the expression of these antigenic molecules. At this point, I need to admit that a great question mark appeared in my mind, maybe related to the fact that some knowledge is lacking in my analysis. Lines 328-330: "On the other hand, pushing back the expression of viral antigens HA and NA would be beneficial for the virus to delay the host immune response against the infected cells in which the virus is being replicated." As I tend to understand, the host immune response recognizes HA and NA within the viral particle, if so and independently of the time that HA and Na arrive at the virus assembly step, the progeny' viral particles that are complete and extruded from the cells would be those awakening the host immunity response. If this is right, how would the delayed export of those proteins from the nucleus (and their late expression) be beneficial for delaying the immune response? I would appreciate an explanation for this point, and if I am wrong, then there could exist a relationship between nuclear export rate and the pathogenicity of different strains of influenza A virus. If so, could the authors challenge their model with additional viral strains showing a differential immune response pattern? A deeper analysis in this direction would greatly strengthen the message in their manuscript.  Reviewer #1

      * Is the timing of viral protein appearance in accordance with the time the mRNA is exported to the cytoplasm. It is logical that the first mRNA to go to the cytoplasm would be the first to become a protein. Can the authors show that nuclear retention of mRNA would push back the expression of the viral antigens HA and NA.  Reviewer #2

      Three types of immune reactions are being studied extensively.  The first is the humoral immune response, where antibodies target the viral antigens HA and NA on the viral envelope, coating and inactivating the viral particles.  The second is the cytotoxic T cell response.  There is growing evidence that cytotoxic T cells react against NP, eliciting cross-reaction to broader range of influenza viral strains.  This reaction is not specific to HA and NA, and antigens are processed in the cytoplasm and presented on the MHC.  The third is antibody-dependent cellular cytotoxicity (ADCC), where antibodies recognise the viral proteins on the cellular surface (HA and NA) of infected cells, facilitating their elimination by the NK cells.  Although protein translation may begin as soon as the first mRNA exits the nucleus, the virus may delay the peak of the antigen production and therefore, postpone the NK-mediated ADCC.  This specific point, along with references to ADCC in influenza virus infection, has been clarified in the Discussion section (Lines 377-381).  

      Data analysis and presentation

      Lines 99-101. "Viral mRNAs were detected as single diffraction-limited spots in the three-dimensional image stacks, allowing for absolute mRNA quantification (Fig. 1B)". What do the authors mean to say by "absolute mRNA quantification"? Do they refer to the total spots or the total mRNAs? Is it assumed that one spot corresponds to a single mRNA transcript? This is not clear at all for this reviewer, which could be the situation for a potential reader. Since it's the beginning of the story, this should be clearly stated in the manuscript.  Reviewer #1

      Each spot of fluorescent signal corresponds to a single molecule of viral mRNA.  We quantified the absolute number of transcripts in each cell.  This is clarified in the revised manuscript (Lines 104-106).  

      * Line 151: does the baseline change according to the RNA in question? The authors say that the "baseline is defined by the median of the Z distribution of peripheral mRNAs" - it seems that the number 0.731 refers only to one type of RNA (which is not mentioned at all not in the text and not in the legend). Reviewer #2

      The baseline was set using the NP mRNAs in the cytoplasm because the NP mRNA showed the widest distribution across the cytoplasm (Line 157).  

      * Also, what is all the signal that is seen outside the marked cells in Fig. 2B? There seems to be significant background in the field, does this mean much false-positive in the multiplex FISH? If so, then how do the authors know that the staining inside the cells isn't to some degree non-specific? It would be necessary to back this up with some other type of quantitative assay like qRT-PCR.  Reviewer #2

      The cells were removed from the analysis if the cytoplasmic boundary touched any edge of the field-of-view, while the signals were recovered across the entire field-of-view.  This is clarified in the figure legend (Lines 194-195).  

      Others

      * The meaning and explanation for Figure 1H -are unclear. Rephrase and make the legend more reader friendly.  Reviewer #2

      We made alterations to the legend (Lines 132-134) and the relevant lines in the main text (Lines 148-151).  

      * Fig. 2E: Is this the total transcript count or only in the nucleus? Would it be possible to find some correlation between the segments if a pair-wise analysis is performed according to nuclear-cytoplasm distribution?  Reviewer #2

      The total counts are presented.  This is clarified in the legend (Lines 199-200).  

      * Abstract -"A mathematical modelling indicated that the relationship between the nuclear ratio and the total count of mRNAs in single cells is dictated by a proxy for the nuclear export rate." - this sentence is very unclear.  Reviewer #2

      The sentence was removed in the revised manuscript (Line 21).  This removal did not affect the overall meaning in the abstract.  We also made an alteration to Line 279 that contained a similar phrase.  

      * The use of the word "acutely" (lines 16 and 35) is strange.  Reviewer #2

      They have been removed (now Lines 15, 33).  

      * Line 157 - "This result indicates that the velocity of viral mRNA export from the nucleus varies according to the viral segments." - not velocity, maybe timing.  Reviewer #2

      We consistently replaced “velocity” with “rate” (Lines 163, 215, 320, 441).

      * Reference for line 41.  Reviewer #2

      A reference (Waker et al. Trends Microbiol. 2019) has been cited (Line 39).  

      * Reference for lines 105-106.  Reviewer #2

      The gene length of each segment was indicated in the sentence (Line 137).  

      * Line 264- why here is 0.02 M.O.I used compared to line 97 where 2 is used?  Reviewer #2

      We used M.O.I. of 0.02 to allow for spot quantification over longer periods of observation (Lines 269-270).  

      * NS1 is expressed at late infection times and might alter the nuclear export of viral mRNAs (line 352). Need to show that indeed it is not expressed in the experiments done here.  Reviewer #2

      It is not possible to definitely prove that NS1 is not expressed due to the sensitivity limitations.  However, we minimised the its impact by investigating at the early time point (Lines 415416).  

      * Line 459- 30% formamide? Is this correct or should it be 10%?  Reviewer #2

      This is correct.  The probes used were longer than the others for smFISH.  Therefore, we washed away the probes with the stringent condition.

    1. eLife Assessment

      This study reports a model of 8 somatosensory areas of the rat cortex consisting of 4.2 million morphologically and electrically detailed neurons. The authors carry out simulation experiments aimed at understanding how multiscale organization of the cortical network shapes neural activity. While the reviewers found the results to be solid, they note that they could have likely been obtained using a much smaller portion of the model. Nonetheless, the release of the modeling platform represents a significant contribution to the field by providing a valuable resource for the scientific community.

    2. Reviewer #1 (Public review):

      This paper presents a model of the whole somatosensory non-barrel cortex of the rat, with 4.2 million morphologically and electrically detailed neurons, with many aspects of the model constrained by a variety of data. The paper focuses on simulation experiments, testing a range of observations. These experiments are aimed at understanding how multiscale organization of the cortical network shapes neural activity.

      Strengths

      • The model is very large and detailed. With 4.2 million neurons and 13.2 billion synapses, as well as the level of biophysical realism employed, it is a highly comprehensive computational representation of the cortical network.

      • Large scope of work - the authors cover a variety of properties of the network structure and activity in this paper, from dendritic and synaptic physiology to multi-area neural activity.

      • Direct comparisons with experiments, shown throughout the paper, are laudable.

      • The authors make a number of observations, like describing how high-dimensional connectivity motifs shape patterns of neural activity, which can be useful for thinking about the relations between the structure and the function of the cortical network.

      • Sharing the simulation tools and a "large subvolume of the model" is appreciated.

      Weaknesses

      • A substantial part of this paper - the first few figures - focuses on single-cell and single-synapse properties, with high similarity to what was shown in Markram et al., 2015. Details may differ, but overall it is quite similar.

      • Although the paper is about the model of the whole non-barrel somatosensory cortex, out of all figures, only one deals with simulations of the whole non-barrel somatosensory cortex. Most figures focus on simulations that involve one or a few "microcolumns". Again, it is rather similar to what was done in Markram et al., 2015 and constitutes relatively incremental progress.

      • With a model like this, one has an opportunity to investigate computations and interactions across an extensive cortical network in an in vivo-like context. However, the simulations presented are not addressing realistic specific situations corresponding to animals performing a task or perceiving a relevant somatosensory stimulus. This makes the insights into roles of cell types or connectivity architecture less interesting, as they are presented for relatively abstract situations. It is hard to see their relationship to important questions that the community would be excited about - theoretical concepts like predictive coding, biophysical mechanisms like dendritic nonlinearities, or circuit properties like feedforward, lateral, and feedback processing across interacting cortical areas. In other words, what do we learn from this work conceptually, especially, about the whole non-barrel somatosensory cortex?

      • Most of comparisons with in vivo-like activity are done using experimental data for whisker deflection (plus some from the visual stimulation in V1). But this model is for the non-barrel somatosensory cortex, so exactly the part of the cortex that has less to do with whiskers (or vision). Is it not possible to find any in vivo neural activity data from non-barrel cortex?

      • The authors almost do not show raw spike rasters or firing rates. I am sure most readers would want to decide for themselves whether the model makes sense, and for that the first thing to do is to look at raster plots and distributions of firing rates. Instead, the authors show comparisons with in vivo data using highly processed, normalized metrics.

      • While the authors claim that their model with one set of parameters reproduces many experimentally established metrics, that is not entirely what one finds. Instead, they provide different levels of overall stimulation to their model (adjusting the target "P_FR" parameter, with values from 0 to 1, and other parameters), and that influences results. If I get this right (the figures could really be improved with better organization and labeling), simulations with P_FR closer to 1 provide more realistic firing rate levels for a few different cases, however, P_FR of 0.3 and possibly above tends to cause highly synchronized activity - what the authors call bursting, but which also could be called epileptic-like activity in the network.

      • The authors mention that the model is available online, but the "Resource availability" section does not describe that in substantial detail. As they mention in the Abstract, it is only a subvolume that is available. That might be fine, but more detail in appropriate parts of the paper would be useful.

      Comments on revisions:

      The authors addressed all my comments by revising and adding text as well as revising and adding some figures and videos. The limitations described in my previous review (above) mostly remain, but they are much better acknowledged and described now. These limitations can be addressed in the future work, whereas the current paper represents a step forward relative to the state of the art and provides a useful resource for the community.

      Two minor points about the new additions to the paper:

      (1) Something does not seem right in the sentence, "Unlike the Markram et al. (2015) model, the new model can also be exploited by the community and has already been used in a number of follow up papers studying (Ecker et al., 2024a,b; ...)". Should the authors remove "studying"?

      (2) It is great that the authors added more plots and videos of the firing rates, but most of them show maximum-normalized rates, which sort of defeats the purpose. No scale on the y-axis is shown (it can be useful even for normalized data). And it is impossible to see anything for inhibitory populations.

      These are minor points that may not need to be addressed. Overall, it is a nice study that is certainly useful for the field.

      A great improvement is that the model is made fully available to the public.

    1. eLife Assessment

      This important study investigates the implications of human endogenous retrovirus (HERV) activity in myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) and fibromyalgia (FM). These findings indicate significant associations that coincide with previous literature, which has suggested roles for differential HERV activity in degenerative, inflammatory, and aging-related pathologies of the central nervous system (CNS), as well as neurotropic infections. These seminal studies can be strengthened with minor improvements to the methodologies of characterizing differential HERV activity, further characterizing downstream mechanisms by which HERV activity impacts disease and by an expansion of the datasets utilized to include additional cohorts. These compelling findings are of immediate importance to clinicians, policymakers, and researchers interested in the underlying etiology of human health and disease.

    2. Reviewer #1 (Public review):

      Summary:

      Giménez-Orenga et al. investigate the origin and pathophysiology of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) and fibromyalgia (FM). Using RNA microarrays, the authors compare the expression profiles and evaluate the biomarker potential of human endogenous retroviruses (HERV) in these two conditions. Altogether, the authors show that HERV expression is distinct between ME/CFS and FM patients, and HERV dysregulation is associated with higher symptom intensity in ME/CFS. HERV expression in ME/CFS patients is associated with impaired immune function and higher estimated levels of plasma cells and resting CD4 memory T cells. This work provides interesting insights into the pathophysiology of ME/CFS and FM, creating opportunities for several follow-up studies.

      Strengths:

      (1) Overall, the data is convincing and supports the authors' claims. The manuscript is clear and easy to understand, and the methods are generally well-detailed. It was quite enjoyable to read.

      (2) The authors combined several unbiased approaches to analyse HERV expression in ME/CFS and FM. The tools, thresholds, and statistical models used all seem appropriate to answer their biological questions.

      (3) The authors propose an interesting alternative to diagnosing these two conditions. Transcriptomic analysis of blood samples using an RNA microarray could allow a minimally invasive and reproducible way of diagnosing ME/CFS and FM.

      Weaknesses:

      (1) The cohort analysed in this study was phenotyped by a single clinician. As ME/CFS and FM are diagnosed based on unspecific symptoms and are frequently misdiagnosed, this raises the question of whether the results can be generalised to external cohorts.

      (2) The analyses performed to unravel the causes and effects of HERV expression in ME/CFS and FM are solely based on sequencing data. Experimental approaches could be used to validate some of the transcriptomic observations.

    3. Reviewer #2 (Public review):

      Summary:

      Giménez-Orenga carried out this study to assess whether human endogenous retroviruses (HERVs) could be used to improve the diagnosis of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) and Fibromyalgia (FM). To this end, they used the HERV-V3 array developed previously, to characterize the genome-wide changes in the expression of HERVs in patients suffering from ME/CFS, FM, or both, compared to controls. In turn, they present a useful repertoire of HERVs that might characterize ME/CFS and FM. For the most part, the paper is written in a manner that allows a natural understanding of the workflow and analyses carried out, making it compelling. The figures and additional tables present solid support for the findings. However, some statements made by the authors seem incomplete and would benefit from a more thorough literature review. Overall, this work will be of interest to the medical community seeking in better understanding of the co-occurrence of these pathologies, hinting at a novel angle by integrating HERVs, which are often overlooked, into their assessment.

      Strengths:

      (1) The work is well-presented, allowing the reader to understand the overall workflow and how the specific aims contribute to filling the knowledge gap in the field.

      (2) The analyses carried out to understand the potential impact on gene expression mediated by HERVs are in line with previous works, making it solid and robust in the context of this study.

      Weaknesses:

      (1) The authors claim to obtain genome-wide HERV expression profiles. However, the array used was developed using hg19, while the genomic analysis of this work are carried out using a liftover to hg38. It would improve the statement and findings to include a comparison of the differences in HERVs available in hg38, and how this could impact the "genome-wide" findings.

      (2) The authors in some points are not thorough with the cited literature. Two examples are:<br /> a) Lines 396-397 the authors say "the MLT1, usually found enriched near DE genes (Bogdan et al., 2020)". I checked the work by Bogdan, and they studied bacterial infection. A single work in a specific topic is not sufficient to support the statement that MLT1 is "usually" in close vicinity to differentially expressed genes. More works are needed to support this.<br /> b) After the previous statement, the authors go on to mention "contributing to the coding of conserved lncRNAs (Ramsay et al., 2017)". First, lnc = long non-coding, so this doesn't make sense. Second, in the work by Ramsay they mention "that contributed a significant amount of sequence to primate lncRNAs whose expression was conserved", which is different from what the authors in this study are trying to convey. Again, additional work and a rephrasing might help to support this idea.

      (3) When presenting the clusters, the authors overlook the fact that cluster 4 is clearly control-specific, and fail to discuss what this means. Could this subset of HERV be used as bona fide markers of healthy individuals in the context of these diseases? Are they associated with DE genes? What could be the impact of such associations?

      Appraisals on aims:

      The authors set specific questions and presented the results to successfully answer them. The evidence is solid, with some weaknesses discussed above that will methodologically strengthen the work.

      Likely impact of work on the field:

      This work will be of interest to the medical community looking for novel ways to improve clinical diagnosis. Although future works with a greater population size, and more robust techniques such as RNA-Seq, are needed, this is the first step in presenting a novel way to distinguish these pathologies.

      It would be of great benefit to the community to provide a table/spreadsheet indicating the specific genomic locations of the HERVs specific to each condition. This will allow proper provenance for future researchers interested in expanding on this knowledge, as these genomic coordinates will be independent of the technique used (as was the array used here).

    4. Reviewer #3 (Public review):

      The authors find that HERV expression patterns can be used as new criteria for differential diagnosis of FM and ME/CFS and patient subtyping. The data are based on transcriptome analysis by microarray for HERVs using patient blood samples, followed by differential expression of ERVs and bioinformatic analyses. This is a standard and solid data processing pipeline, and the results are well presented and support the authors' claim.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Giménez-Orenga et al. investigate the origin and pathophysiology of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) and fibromyalgia (FM). Using RNA microarrays, the authors compare the expression profiles and evaluate the biomarker potential of human endogenous retroviruses (HERV) in these two conditions. Altogether, the authors show that HERV expression is distinct between ME/CFS and FM patients, and HERV dysregulation is associated with higher symptom intensity in ME/CFS. HERV expression in ME/CFS patients is associated with impaired immune function and higher estimated levels of plasma cells and resting CD4 memory T cells. This work provides interesting insights into the pathophysiology of ME/CFS and FM, creating opportunities for several follow-up studies.

      Strengths:

      (1) Overall, the data is convincing and supports the authors' claims. The manuscript is clear and easy to understand, and the methods are generally well-detailed. It was quite enjoyable to read.

      (2) The authors combined several unbiased approaches to analyse HERV expression in ME/CFS and FM. The tools, thresholds, and statistical models used all seem appropriate to answer their biological questions.

      (3) The authors propose an interesting alternative to diagnosing these two conditions. Transcriptomic analysis of blood samples using an RNA microarray could allow a minimally invasive and reproducible way of diagnosing ME/CFS and FM.

      Weaknesses:

      (1) The cohort analysed in this study was phenotyped by a single clinician. As ME/CFS and FM are diagnosed based on unspecific symptoms and are frequently misdiagnosed, this raises the question of whether the results can be generalised to external cohorts.

      Thank you for your comment. Surely the study of larger cohorts will determine the external validity of these results in a clinical scenario. However, this pilot study, first of its kind, was designed to maximize homogeneity across participants which seemed primarily ensured by inclusion of females only diagnosed by a single experienced observer.

      (2) The analyses performed to unravel the causes and effects of HERV expression in ME/CFS and FM are solely based on sequencing data. Experimental approaches could be used to validate some of the transcriptomic observations.

      Certainly, experimental approaches may add robustness to our findings. We in fact consider taking this avenue to deepen in the observations presented here. However, the limited knowledge of HERV-mediated physiological functions may hinder the task of revealing causes and effects of HERV expression in ME/CFS and FM in the short term.

      Reviewer #2 (Public review):

      Summary:

      Giménez-Orenga carried out this study to assess whether human endogenous retroviruses (HERVs) could be used to improve the diagnosis of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) and Fibromyalgia (FM). To this end, they used the HERV-V3 array developed previously, to characterize the genome-wide changes in the expression of HERVs in patients suffering from ME/CFS, FM, or both, compared to controls. In turn, they present a useful repertoire of HERVs that might characterize ME/CFS and FM. For the most part, the paper is written in a manner that allows a natural understanding of the workflow and analyses carried out, making it compelling. The figures and additional tables present solid support for the findings. However, some statements made by the authors seem incomplete and would benefit from a more thorough literature review. Overall, this work will be of interest to the medical community seeking in better understanding of the co-occurrence of these pathologies, hinting at a novel angle by integrating HERVs, which are often overlooked, into their assessment.

      Strengths:

      (1) The work is well-presented, allowing the reader to understand the overall workflow and how the specific aims contribute to filling the knowledge gap in the field.

      (2) The analyses carried out to understand the potential impact on gene expression mediated by HERVs are in line with previous works, making it solid and robust in the context of this study.

      Weaknesses:

      (1) The authors claim to obtain genome-wide HERV expression profiles. However, the array used was developed using hg19, while the genomic analysis of this work are carried out using a liftover to hg38. It would improve the statement and findings to include a comparison of the differences in HERVs available in hg38, and how this could impact the "genome-wide" findings.

      This is an important point. However, the low number of probes that were excluded from our analysis by lack of correspondence with hg38, less than 100 among the 1,290,800 probesets, was interpreted as insignificant for "genome-wide" claims. An aspect that will be detailed in the revised version of this manuscript.

      (2) The authors in some points are not thorough with the cited literature. Two examples are:

      a) Lines 396-397 the authors say "the MLT1, usually found enriched near DE genes (Bogdan et al., 2020)". I checked the work by Bogdan, and they studied bacterial infection. A single work in a specific topic is not sufficient to support the statement that MLT1 is "usually" in close vicinity to differentially expressed genes. More works are needed to support this.

      b) After the previous statement, the authors go on to mention "contributing to the coding of conserved lncRNAs (Ramsay et al., 2017)". First, lnc = long non-coding, so this doesn't make sense. Second, in the work by Ramsay they mention "that contributed a significant amount of sequence to primate lncRNAs whose expression was conserved", which is different from what the authors in this study are trying to convey. Again, additional work and a rephrasing might help to support this idea.

      Certainly, these two sentences need rephrasing to better adjust statements to current evidence and will be replaced in the revised version of this manuscript.

      (3) When presenting the clusters, the authors overlook the fact that cluster 4 is clearly control-specific, and fail to discuss what this means. Could this subset of HERV be used as bona fide markers of healthy individuals in the context of these diseases? Are they associated with DE genes? What could be the impact of such associations?

      Using control DE HERV as bona fide markers of healthy individuals seems like an interesting possibility worth exploring. Control DE HERVs (cluster 4) are indeed associated with DE genes involved in apoptosis, T cell activation and cell-cell adhesion (modules 1 and 6) (Figure 3A). The impact of which deserves further study.

      Appraisals on aims:

      The authors set specific questions and presented the results to successfully answer them. The evidence is solid, with some weaknesses discussed above that will methodologically strengthen the work.

      Likely impact of work on the field:

      This work will be of interest to the medical community looking for novel ways to improve clinical diagnosis. Although future works with a greater population size, and more robust techniques such as RNA-Seq, are needed, this is the first step in presenting a novel way to distinguish these pathologies.

      It would be of great benefit to the community to provide a table/spreadsheet indicating the specific genomic locations of the HERVs specific to each condition. This will allow proper provenance for future researchers interested in expanding on this knowledge, as these genomic coordinates will be independent of the technique used (as was the array used here).

      We agree with the reviewer that sharing genomic locations of DE HERVs in these pathologies would contribute to further development of our findings. Unfortunately, we do not hold the rights to share probe coordinates from this custom HERV-V3 microarray which we used under MTA agreement with its developer.

      Reviewer #3 (Public review):

      The authors find that HERV expression patterns can be used as new criteria for differential diagnosis of FM and ME/CFS and patient subtyping. The data are based on transcriptome analysis by microarray for HERVs using patient blood samples, followed by differential expression of ERVs and bioinformatic analyses. This is a standard and solid data processing pipeline, and the results are well presented and support the authors' claim.

    1. eLife Assessment

      This study investigated the influence of genomic information and timing of vaccine strain selection on the accuracy of influenza A/H3N2 forecasting. The authors utilised appropriate statistical methods and have provided solid evidence that is an important contribution to the evidence base. While the study addresses a key aspect of public health, the impact is rather limited by its exclusive reliance on predictive methods using genomic information, without incorporating phenotypic data.

    2. Reviewer #1 (Public review):

      Summary:

      In the paper, the authors investigate how the availability of genomic information and the timing of vaccine strain selection influence the accuracy of influenza A/H3N2 forecasting. The manuscript presents three key findings:

      (1) Using real and simulated data, the authors demonstrate that shortening the forecasting horizon and reducing submission delays for sharing genomic data improve the accuracy of virus forecasting.

      (2) Reducing submission delays also enhances estimates of current clade frequencies.

      (3) Shorter forecasting horizons, for example, allowed by the proposed use of "faster" vaccine platforms such as mRNA, resulting in the most significant improvements in forecasting accuracy.

      Strengths:

      The authors present a robust analysis, using statistical methods based on previously published genetic-based techniques to forecast influenza evolution. Optimizing prediction methods is crucial from both scientific and public health perspectives. The use of simulated as well as real genetic data (collected between April 1, 2005, and October 1, 2019) to assess the effects of shorter forecasting horizons and reduced submission delays is valuable and provides a comprehensive dataset. Moreover, the accompanying code is openly available on GitHub and is well-documented.

      Weaknesses:

      While the study addresses a critical public health issue related to vaccine strain selection and explores potential improvements, its impact is somewhat constrained by its exclusive reliance on predictive methods using genomic information, without incorporating phenotypic data. The analysis remains at a high level, lacking a detailed exploration of factors such as the genetic distance of antigenic sites.

      Another limitation is the subsampling of the available dataset, which reduces several tens of thousands of sequences to just 90 sequences per month with even sampling across regions. This approach, possibly due to computational constraints, might overlook potential effects of regional biases in clade distribution that could be significant. The effect of dataset sampling on presented findings remains unexplored. Although the authors acknowledge limitations in their discussion section, the depth of the analysis could be improved to provide a more comprehensive understanding of the underlying dynamics and their effects.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have examined the effects of two parameters that could improve their clade forecasting predictions for A(H3N2) seasonal influenza viruses based solely on analysis of haemagglutinin gene sequences deposited on the GISAID Epiflu database. Sequences were analysed from viruses collected between April 1, 2005 and October 1, 2019. The parameters they investigated were various lag periods (0, 1, 3 months) for sequences to be deposited in GISAID from the time the viruses were sequenced. The second parameter was the time the forecast was accurate over projecting forward (for 3,6,9,12 months). Their conclusion (not surprisingly) was that "the single most valuable intervention we could make to improve forecast accuracy would be to reduce the forecast horizon to 6 months or less through more rapid vaccine development". This is not practical using conventional influenza vaccine production and regulatory procedures. Nevertheless, this study does identify some practical steps that could improve the accuracy and utility of forecasting such as a few suggested modifications by the authors such as "..... changing the start and end times of our long-term forecasts. We could change our forecasting target from the middle of the next season to the beginning of the season, reducing the forecast horizon from 12 to 9 months.'

      Strengths:

      The authors are very familiar with the type of forecasting tools used in this analysis (LBI and mutational load models) and the processes used currently for influenza vaccine virus selection by the WHO committees having participated in a number of WHO Influenza Vaccine Consultation meetings for both the Southern and Northern Hemispheres.

      Weaknesses:

      The conclusion of limiting the forecasting to 6 months would only be achievable from the current influenza vaccine production platforms with mRNA. However, there are no currently approved mRNA influenza vaccines, and mRNA influenza vaccines have also yet to demonstrate their real-world efficacy, longevity, and cost-effectiveness and therefore are only a potential platform for a future influenza vaccine. Hence other avenues to improve the forecasting should be investigated.

      While it is inevitable that more influenza HA sequences will become available over time a better understanding of where new influenza variants emerge would enable a higher weighting to be used for those countries rather than giving an equal weighting to all HA sequences.

      Also, other groups are considering neuraminidase sequences and how these contribute to the emergence of new or potentially predominant clades.

    4. Author response:

      Thank you to the reviewers and editors for their positive and constructive comments. Based on this feedback, we can see that we need to clarify that the primary goal of this paper is a test of potential changes in public health policy rather than a test of technical improvements to forecasting models. We briefly summarize the primary goal below to address these public reviews and list our proposed revisions to the manuscript based on reviewer feedback.

      All real-time forecasting models contend with 2 major constraints:

      (1) How far into the future they have to predict

      (2) How rapidly the data used for predictions become available in real time

      In the case of evolutionary influenza forecasts, the current values of these constraints are 1) 12 months into the future and 2) an average lag of ~3 months for hemagglutinin (HA) sequences to become available after sample collection. Regardless of the predictors we use in these models (genetic or phenotypic), our units of prediction always depend on HA: the HA protein is the primary target of our immunity, HA is the only gene whose composition is determined by the vaccine selection process, and influenza diversity is historically defined by clades in HA phylogenies.

      Our primary goal of this study was to understand the relative effect sizes of these two common constraints on forecasting while holding all other variables as constant as possible. With this understanding, we hoped to better inform public health priorities and set realistic expectations for current and future forecasting efforts regardless of the technical specifications of each forecasting model. In other words, the goal of this study was not to optimize prediction methods but to estimate the effects of potential policy changes on forecast accuracy.

      We found that reducing how far into the future we need to predict consistently reduced our forecasting error in simulated populations (where we knew the true fitness of each virus) and in natural populations (where we either estimated fitness from genetic predictors or we knew the true fitness of each virus based on its future success). Figure 6 and its first supplemental figure show these effect sizes for natural and simulated populations, respectively, when the future fitness of each virus is known at the time of prediction. By definition, we cannot hope to improve our estimates of viral fitness for these forecasts by using other genetic or phenotypic information.

      Figure 6 shows that reducing how far into the future we need to predict from 12 to 6 months improves our forecasting accuracy 3 times as much as reducing the lag between sample collection and HA sequence submission to public databases. The impact of this finding is the confirmation that a faster vaccine development process would improve our forecast accuracy substantially more than faster turnaround between sample collection and sequence submission. If our public health goal is to make better predictions of future influenza populations, then this result indicates that our main priority is to speed up the vaccine development process.

      If our public health goal is to better understand the composition of currently circulating influenza populations (the units of our forecasts), then Figure 3 shows that reducing the lag between sample collection and HA sequence submission from ~3 months on average to 1 month on average reduces our uncertainty in current clade frequency estimates by half. This impact is also independent of the predictors we use in our forecasting models and is not lessened by the lack of other genetic or phenotypic information in our analyses.

      We realize that neither a 6-month vaccine development process nor a 1-month average sequence submission lag exist yet, but we believe that these are realistic and achievable goals for scientific and public health communities. We also realize that these public health goals are not mutually exclusive. By measuring the effects of these realistic changes to current policy through our forecasting experiments, we hope to inspire and motivate researchers and decision-makers who are empowered to make both of these goals a reality.

      Finally, we want to emphasize that the use of phenotypic data in forecasts introduces additional delays caused by the lag between when genetic sequences become available and when serological experiments can be performed. Most WHO influenza collaborating centers use a "sequence-first" approach where they characterize the genetic sequence and use available sequences to prioritize phenotypic experiments with serology. This additional lag in availability of phenotypic data means that a forecasting model based on genetic and phenotypic data will necessarily have a greater lag in data availability than a model based on genetic data only. This lag is important for practical forecasts, too, but because the lag reflects specific characteristics of each collaborating center and not a global policy change, we believe this topic falls outside of the scope of this study.

      Based on these public reviews and the private recommendations from reviewers, we plan to make the following revisions to this manuscript.

      ● Clarify the introduction, discussion, and abstract to emphasize the primary goal for this study to test effects of realistic changes to public health policy and note that this study does not cover improvements to forecasting models. As part of these changes, we will include a rationale for our choice of a genetic-information-only approach rather than a model that integrates phenotypic data. We will also refine Figure 1 to more clearly communicate the two factors we tested in this study.

      ● Provide a clearer explanation for the subsampling approach we use, include supplemental materials to communicate the geographic and temporal biases that exist in available HA sequence data, and discuss potential effects of different subsampling strategies.

      ● Evaluate the robustness of our results to different randomly subsampled data. We will perform additional technical replicates of our analysis workflow for natural populations, and summarize the effects of realistic interventions across replicates in a supplemental figure and the main text of the results.

      ● Investigate time-dependent effects of forecast horizons and submission lags on model accuracy to identify any potential biases in accuracy during specific historical epochs or any seasonal trends in accuracy associated with predicting future populations for the Northern or Southern Hemispheres.

      ● In the discussion, clarify how reducing submission lags would practically improve the WHO's ability to select vaccine candidate viruses and minimize jargon that currently makes the discussion less accessible to the average reader.

      ● Investigate how changes in forecast horizons and submission lags change the distance between predicted and observed future populations at antigenic positions (i.e., "epitope" positions) to understand whether we see the same effects with that subset of positions as we see across all HA positions.

    1. Author Response:

      We greatly appreciate the feedback provided by reviewers on this manuscript. One of our key objectives was to provide a comprehensive, detailed resource for researchers using single-cell transcriptomics to study arthritis, especially immune cells like macrophages. We strived to perform thorough, wide-ranging analyses that are both accessible and useful to other scientists in the field, and that we hope will serve as the basis for many future avenues of study. As such, we acknowledge that this work is a “first step”, providing a strong descriptive foundation with some mechanistic insight that we and others will continue pursuing. Preliminary studies in our laboratory seeking to dissect signaling mechanisms associated with the M-CSF pathway have illuminated how complex and context-dependent this signaling is, which is an important consideration for future in vivo investigations. Further, it is indeed true that attempting to harmonize transcriptomic data across studies, models, laboratories, and dissection/processing methods is fraught with difficulty and prone to misinterpretation – and we made an effort to highlight this in our manuscript, particularly with respect to where synovial immune cells were recovered from, and how. We encourage healthy discussion within the field for developing shared, unified protocols for harvests and processing upstream of transcriptomic experiments.

    1. eLife Assessment

      The authors report how a previously published method, ReplicaDock, can be used to improve predictions from AlphaFold-multimer (AFm) for protein docking studies. The level of improvement is modest for cases where AFm is successful; for cases where AFm is not as successful, the improvement is more significant, although the accuracy of prediction is also notably lower. The evidence for the ReplicaDock approach being more predictive than AFm is particularly convincing for the antibody-antigen test case. Overall, the study makes a valuable contribution by combining data- and physics-driven approaches.

    2. Reviewer #1 (Public review):

      Summary:

      The authors wanted to use AlphaFold-multimer (AFm) predictions to reduce the challenge of physics-based protein-protein docking.

      Strengths:

      They found two features of AFm predictions that are very useful. 1) pLLDT is predictive of flexible residues, which they could target for conformational sampling during docking; 2) the interface-pLLDT score is predictive of the quality of AFm predictions, which allows the authors to decide whether to do local or global docking.

      Weaknesses:

      (1) As admitted by the authors, the AFm predictions for the main dataset are undoubtedly biased because these structures were used for AFm training. Could the authors find a way to assess the extent of this bias?<br /> (2) For the CASP15 targets where this bias is absent, the presentation was very brief. In particular, I'm interested in seeing how AFm helped with the docking? They may even want to do a direct comparison with docking results w/o the help of AFm.

      Comments on revisions:

      This revision has addressed my previous comments.

    3. Reviewer #2 (Public review):

      Summary:

      In short, this paper uses a previously published method, ReplicaDock to improve predictions from AlphaFold-multimer. The method generated about 25% more acceptable predictions than AFm, but more important is improving an Antibody-antigen set, where more than 50% of the models become improved.

      When looking at the results in more detail, it is clear that for the models where the AFm models are good, the improvement is modest (or not at all). See, for instance, the blue dots in Fig 6. However, in the cases where AFm fails, the improvement is substantial (red dots in Fig 6), but no models reach a very high accuracy (Fnat ~0.5 compared to 0.8 for the good AFm models). So the paper could be summarized by claiming, "We apply ReplicaDock when AFm fails", instead of trying to sell the paper as an utterly novel pipeline. I must also say that I am surprised by the excellent performance of ReplicaDock - it seems to be a significant step ahead of other (not AlphaFold) docking methods, and from reading the original paper, that was unclear. Having a better benchmark of it alone (without AFm) would be very interesting.

      These results also highlight several questions I try to describe in the weakness section below. In short, they boil down to the fact that the authors must show how good/bad ReplicaDock is at all targets (not only the ones where AFm fails. In addition, I have several more technical comments.

      Strengths:

      Impressive increase in performance on AB-AG set (although a small set and no proteins ).

      Weaknesses:

      The presentation is a bit hard to follow. The authors mix several measures (Fnat, iRMS, RMSDbound, etc). In addition, it is not always clear what is shown. For instance, in Fig 1, is the RMSD calculated for a single chain or the entire protein? I would suggest that the author replace all these measures with two: TM-score when evaluating the quality of a single chain and DockQ when evaluating the results for docking. This would provide a clearer picture of the performance. This applies to most figures and tables. For instance, Fig 9 could be shown as a distribution of DockQ scores.

      The improvements on the models where AFm is good are minimal (if at all), and it is unclear how global docking would perform on these targets, nor exactly why the plDDT<0.85 cutoff was chosen. To better understand the performance of ReplicaDock, the authors should therefore (i) run global and local docking on all targets and report the results, (ii) report the results if AlphaFold (not multimer) models of the chains were used as input to ReplicaDock (I would assume it is similar). These models can be downloaded from AlphaFoldDB.

      Further, it would be interesting to see if ReplicaDock could be combined with AFsample (or any other model to generate structural diversity) to improve performance further.

      The estimates of computing costs for the AFsample are incorrect (check what is presented in their paper). What are the computational costs for RepliaDock global docking?

      It is unclear strictly what sequences were used as input to the modelling. The authors should use full-length UniProt sequences if that were not done.

      The antibody-antigen dataset is small. It could easily be expanded to thousands of proteins. It would be interesting to know the performance of ReplicaDock on a more extensive set of Antibodies and nanobodies.

      Using pLDDT on the interface region to identify good/bas models is likely suboptimal. It was acceptable (as a part of the score) for AlphaFold-2.0 (monomer), but AFm behaves differently. Here, AFm provides a direct score to evaluate the quality of the interaction (ipTM or Ranking Confidence). The authors should use these to separate good/bad models (for global/local docking), or at least show that these scores are less good than the one they used.

      Comments on revisions:

      The inclusion of the DockQ improved the paper. No further comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)

      Summary:

      The authors wanted to use AlphaFold-multimer (AFm) predictions to reduce the challenge of physics-based protein-protein docking.

      Strengths:

      They found that two features of AFm predictions are very useful. 1) pLLDT is predictive of flexible residues, which they could target for conformational sampling during docking; 2) the interface-pLLDT score is predictive of the quality of AFm predictions, which allows the authors to decide whether to do local or global docking.

      Weaknesses:

      (1) As admitted by the authors, the AFm predictions for the main dataset are undoubtedly biased because these structures were used for AFm training. Could the authors find a way to assess the extent of this bias?

      Indeed, the AFm training included most of the structures in the DB5 benchmark for its training as many structures (either unbound or bound) were deposited before the training cut-off period. One of the challenges of estimating this bias is the availability of new structures - both bound and unbound deposited after the training cut-off. Estimating the extent of training bias is therefore conditional on these factors and difficult. A few studies have attempted to address this bias (Yin et al, 2022, https://doi.org/10.1002/pro.4379).

      In our study, we assess this bias by comparing the AFm structures to the bound and unbound forms and calculating their Ca RMSDs and TM-scores (new addition). We now elaborate in the Results:Dataset curation section and we have added a figure comparing the TM-scores in the supplement.

      We added a clarifying text and a note about the TM-score calculation in the manuscript as follows:

      “Since most of the benchmark targets in DB5.5 were included in AlphaFold training, there would be training bias associated with their predictions (i.e. our measured success rates are an upper bound).”

      “We also calculated the TM-scores of the AFm predicted complex structures with respect to the bound and the unbound crystal structures (Supplementary Figure S2). As TM-scores reflect a global comparison between structures and are less sensitive to local structural deviations, no strong conclusions could be derived. This is in agreement with our intuition that since both unbound and bound states of proteins will share a similar fold, and AlphaFold can predict structures with high TM-scores in most cases, gauging the conformational deviations with TM-scores would be inconclusive.”

      (2) For the CASP15 targets where this bias is absent, the presentation was very brief. In particular, it would be interesting to see how AFm helped with the docking. The authors may even want to do a direct comparison with docking results without the help of AFm.

      Unfortunately since this was a CASP-CAPRI round, the structure of the unbound Antigen or the nanobodies was unavailable. Thus we cannot perform a comparison without using AF2 at all since we need a structure prediction tool to produce the unbound nanobody and the nanobody-antigen complex template structure to dock. This has been clarified in the main text for better understanding for the readers.

      “Since the nanobody-antigen complexes were CASP targets, we did not have unbound structures, rather only the sequences of individual chains. Therefore, for each target, we employed the AlphaRED strategy as described in Fig 7.”

      Reviewer #1 (Recommendations For The Authors):

      For suggestions for major improvements, see comments under weaknesses. One additional suggestion: the authors found that pLLDT is predictive of flexible residues. Can they try to find AFm features that are predictive of the interface site? Such information may guide their docking to a local site.

      This is a great idea that we and others have been thinking about considerably. Prior work by Burke et al. (Towards a structurally resolved human protein interaction network) examines AlphaFold’s ability to predict PPIs. For high-confidence predicted models of interacting protein complexes, the authors showed that pDockQ correlated reasonably well with correct protein interactions.

      That being said, binding site identification, particularly in a partner-agnostic fashion, i.e. determining binding patches on a given protein, is an area of on-going research . We hope a future study examines AlphaFold3 or ESM3 specifically for this task.

      “Further, we tested multiple thresholds to estimate the optimum cut-off for distinguishing near-native structures (defined as an interface-RMSD < 4 Å) from the predictions. Figure 3.B summarizes the performance with a confusion matrix for the chosen interface-pLDDT cutoff of 85. 79 % of the targets are classified accurately with a precision of 75%, thereby validating the utility of interface-pLDDT as a discriminating metric to rank the docking quality of the AFm complex structure predictions. With AlphaFold3 and ESM3 being released, investigating features that could predict flexible residues or interface site would be valuable, as this information may guide local docking.”

      Minor:

      Page 3, lines 73-77, state how many targets were curated from DB5.5.

      We have now clarified this in the manuscript. All 254 targets curated from DB5.5 at the time of this benchmark study.

      “For each protein target, we extracted the amino acid sequences from the bound structure and predicted a corresponding three-dimensional complex structure with the ColabFold implementation of the AlphaFold multimer v2.3.0 (released in March 2023) for the 254 benchmark targets from DB5.5.”

      In Figure 1, the color used for medium is too difficult to distinguish from the grey color used for rigid.

      We thank you for this suggestion. We have updated the color to olive. Further, based on Reviewer 2’s suggestions, we have moved this plot to the Supplementary.

      Reviewer #2 (Public Review):

      Summary:

      In short, this paper uses a previously published method, ReplicaDock, to improve predictions from AlphaFold-multimer. The method generated about 25% more acceptable predictions than AFm, but more important is improving an Antibody-antigen set, where more than 50% of the models become improved.

      When looking at the results in more detail, it is clear that for the models where the AFm models are good, the improvement is modest (or not at all). See, for instance, the blue dots in Figure 6. However, in the cases where AFm fails, the improvement is substantial (red dots in Figure 6), but no models reach a very high accuracy (Fnat ~0.5 compared to 0.8 for the good AFm models). So the paper could be summarized by claiming, "We apply ReplicaDock when AFm fails", instead of trying to sell the paper as an utterly novel pipeline. I must also say that I am surprised by the excellent performance of ReplicaDock - it seems to be a significant step ahead of other (not AlphaFold) docking methods, and from reading the original paper, that was unclear. Having a better benchmark of it alone (without AFm) would be very interesting.

      We thank the reviewer for highlighting the performance of ReplicaDock. ReplicaDock alone is benchmarked in the original paper (10.1371/journal.pcbi.1010124), with full details on the 2022 version of DB5.5 in the supplement. Indeed ReplicaDock2 achieves the highest reported success rates on flexible docking targets reported in the literature (until this AlphaRED paper!).

      Regarding this statement about “the paper could be summarized…” it might be helpful to give more context. ReplicaDock is a replica exchange Monte Carlo sampling approach for protein docking that incorporates flexibility in an induced-fit fashion. However, the choice of which backbone residues to move is solely dependent on contacts made during each docking trajectory. In the last section of the ReplicaDock paper, we introduced “Directed Induced-fit” where we biased the backbone sampling only towards those residues where we knew the backbone is flexible (this information is obtained because for the benchmark set, we had both unbound and bound structures and hence could cherry-pick the specific residues which are mobile). We agree with the reviewers that AlphaRED is essentially a derivative of ReplicaDock, however, the two major claims that we make in this paper are:

      (1) AlphaFold pLDDT is an effective predictor of backbone flexibility for practical use in docking.

      (2) We can automate the Directed InducedFit approach within ReplicaDock by utilizing this pLDDT information per residue for conformational sampling in protein docking; and in doing so, create a pipeline that would allow us to go from sequence-to-structure-to-complex, specifically capturing conformational changes.

      To conclude these claims, we pose the following questions in the Introduction:

      “(1) Do the residue-specific estimates from AF/AFm relate to potential metrics demonstrating conformational flexibility?

      (2) Can AF/AFm metrics deduce information about docking accuracy?

      (3) Can we create a docking pipeline for in-silico complex structure prediction incorporating AFm to convert sequence-to-structure-to-docked complexes?”

      This work requires a pipeline, the center of which lies in ReplicaDock as a docking method, but has functionalities that were absent in prior work. The goal is also to develop a one-stop shop without manual intervention (a prerequisite for biasing backbone sampling in ReplicaDock) that could be utilized by structural biologists efficiently.

      We clarify this points in the abstract and main text as follows:

      Abstract: “In this work, we combine AlphaFold as a structural template generator with a physics-based replica exchange docking algorithm \add{to better sample conformational changes.”

      Introduction:

      “The overarching goal is to create a one-stop, fully-automated pipeline for simple, reproducible, and accurate modeling of protein complexes. We investigate the aforementioned questions and create a protocol to resolve AFm failures and capture binding-induced conformational changes. We first assess the utility of AFm confidence metrics to detect conformational flexibility and binding site confidence.”

      These results also highlight several questions I try to describe in the weakness section below. In short, they boil down to the fact that the authors must show how good/bad ReplicaDock is at all targets (not only the ones where AFm fails. In addition, I have several more technical comments.

      Strengths:

      Impressive increase in performance on AB-AG set (although a small set and no proteins).

      We thank the reviewer for their comments.

      Weaknesses:

      The presentation is a bit hard to follow. The authors mix several measures (Fnat, iRMS, RMSDbound, etc). In addition, it is not always clear what is shown. For instance, in Figure 1, is the RMSD calculated for a single chain or the entire protein? I would suggest that the author replace all these measures with two: TM-score when evaluating the quality of a single chain and DockQ when evaluating the results for docking. This would provide a clearer picture of the performance. This applies to most figures and tables.

      We apologize for the lack of clarity owing to different metrics. Irms and fnat are standard performance metrics in the docking field, but we agree that DockQ would be simpler when the detail of the other metrics are not required. We have updated the figures Figure 5 and Figure 8 to also show DockQ comparisons.

      Regarding Figure 1, as highlighted in Line 90 of the main-text, “Figure 1 shows the Ca-RMSD of all protein partners of the AFm predicted complex structures with respect to the bound and the unbound.” As suggested by the reviewer in their further comments, we have moved this FIgure to the Supplementary. We have also included TM-score comparison in the Supplementary ( SupFig S2) and included clarifying statements in the main text:

      “We also tested TM-scores to measure the structural deviations of the AFm predicted complex structures with respect to the bound and unbound structures (Supplementary Figure S2). However, this metric is not sensitive enough to detect the subtle, local conformational changes upon binding.”

      For instance, Figure 9 could be shown as a distribution of DockQ scores.

      We have now updated Figure 5 to include DockQ scores in Panel D. Since DockQ is a function of iRMSD, fnat and L-RMSD, it shows cumulative improvement in performance. Some of the nuanced details, such as, the protocol improves i-RMSD considerably but fnat improvement is lacking, and can highlight whether backbone sampling is the challenge or is it sidechain refinement.Therefore, we need to retain the iRMSD and fnat metrics in panel A-C . But We have incorporated this in the main text as follows:

      “Finally, to evaluate docking success rates, we calculate DockQ for top predictions from AFm and AlphaRED respectively (Figure 5D). AlphaRED demonstrates a success rate (DockQ>0.23) for 63% of the benchmark targets. Particularly for Ab-Ag complexes, AFm predicted acceptable or better quality docked structures in only 20% of the 67 targets. In contrast, the AlphaRED pipeline succeeds in 43% of the targets, a significant improvement.”

      Further, we have reevaluated success rates in Figure 8 (previously Figure 9) and have updated the manuscript to report these updated success rates.

      “By utilizing the AlphaRED strategy, we show that failure cases in AFm predicted models are improved for all targets (lower Irms for 97 of 254 failed targets) with CAPRI acceptable-quality or better models generated for 62% of targets overall (Fig 8)”.

      The improvements on the models where AFm is good are minimal (if at all), and it is unclear how global docking would perform on these targets, nor exactly why the plDDT<0.85 cutoff was chosen.

      We agree with the reviewers that the improvement on the models with good AFm predictions is minimal. We acknowledge this in the text now as follows:

      “Most of the improvements in the success rates are for cases where AFm predictions are worse. For targets with good AFm predictions, AlphaRED refinement results in minimal improvements in docking accuracy.”

      The choice of pLDDT cutoff = 85 is elaborated in the “Interface-pLDDT correlates with DockQ and discriminates poorly docked structures” section, paragraph 3. Briefly, we tested multiple metrics and the interface pLDDT had the highest AUC, indicating that it is the best metric for this task. For interface-pLDDT we tested multiple thresholds, and the cutoff of 85 resulted in the highest percentage of true-positive and true-negative rates. This is illustrated with the confusion matrix in Figure 3.B with the precision scores. We now clarify this in the text as follows:

      “With interface-pLDDT as a discriminating metric, we tested multiple thresholds to estimate the optimum cut-off for distinguishing near-native structures (defined as an interface-RMSD < 4 Å) from the predictions. Figure 3B summarizes the performance with a confusion matrix for the chosen interface-pLDDT cutoff of 85. 79% of the targets are classified accurately with a precision of 75%, thereby validating the utility of interface-pLDDT as a discriminating metric to rank the docking quality of the AFm complex structure predictions.”

      To better understand the performance of ReplicaDock, the authors should therefore (i) run global and local docking on all targets and report the results, (ii) report the results if AlphaFold (not multimer) models of the chains were used as input to ReplicaDock (I would assume it is similar). These models can be downloaded from AlphaFoldDB.

      The performance of ReplicaDock on DB5.5 is tabulated in our prior work (https://doi.org/10.1371/journal.pcbi.1010124) and we direct the reviewers there for the detailed performance and results. In our opinion, the benchmark suggested by the reviewer would be redundant and not worth the computational expense.

      The scope of this paper is to highlight a structure prediction + physics-based modeling pipeline for docking to adapt to the accuracy of up-and-coming structure prediction tools.

      Using AlphaFold monomer chains as input and benchmarking on that, albeit interesting scientifically, will not be useful for either the pipeline or biologists who would want a complex structure prediction. We thank the authors for their comments but want to reemphasize that the end goal of this work is to increase the accuracy of complex structure predictions and PPIs obtained from computational tools.

      Further, it would be interesting to see if ReplicaDock could be combined with AFsample (or any other model to generate structural diversity) to improve performance further.

      We would like to highlight that ReplicaDock is a stand-alone tool for protein docking and here we demonstrate the ability of adapting it with metrics derived from AlphaFold or other structure prediction tools (say ESMFold) such as pLDDT for conformational sampling and improving docking accuracy. We definitely agree that adapting it to use with tools such as AFSample will be interesting but it is out of scope of this work.

      The estimates of computing costs for the AFsample are incorrect (check what is presented in their paper). What are the computational costs for RepliaDock global docking?

      The authors of the AFSample paper report that “AFsample requires more computational time than AF2, as it generates 240 models, and including the extra recycles, the overall timing is 1000 more costly than the baseline.” We have reported these exact numbers in our manuscript.

      The computational costs of ReplicaDock are 8-72 CPU hours on a single node with 24 processors as reported in our prior work.

      For AlphaRED, the costs are slightly higher owing to the structure prediction module in the beginning and are up to 100 CPU hrs for our largest (max Nres) target.

      It is unclear strictly what sequences were used as input to the modelling. The authors should use full-length UniProt sequences if they were not done.

      We report this in the methods section of the manuscript as well as in Figure 5. Full length complex sequences were used for the models that we extracted from DB5.5.

      “As illustrated in Fig. 5, given a sequence of a protein complex, we use the ColabFold implementation of AF2-multimer to obtain a predictive template.”

      We clarify this in the methods section as:

      “For each target in the DB5.5 dataset, we first extracted the corresponding FASTA sequence for the bound complex and then obtained AlphaFold predicted models with the ColabFold v1.5.2 implementation of AlphaFold and AlphaFold-multimer (v.2.3.0).”

      The antibody-antigen dataset is small. It could easily be expanded to thousands of proteins. It would be interesting to know the performance of ReplicaDock on a more extensive set of Antibodies and nanobodies.

      This work demonstrates the performance on the docking benchmark, i.e. given unbound structure can you predict the bound complexes. With this regard, our analysis has been focussed on targets where both the unbound and bound structures are available so that we could evaluate the ability of AlphaRED on modeling protein flexibility and docking accuracy. For antibody-antigen complexes, there are only 67 structures with both unbound and bound complexes available and they constituted our dataset. Benchmarking AlphaRED on all antibody-antigen targets can give biased results as most Ab-Ag complexes are in AlphaFold training set. Further, our work is more aimed towards predicting conformational flexibility in docking and not rigid-body docked complexes, so benchmarking on existing bound Ab-Ag structures is out of scope for this work.

      Using pLDDT on the interface region to identify good/bas models is likely suboptimal. It was acceptable (as a part of the score) for AlphaFold-2.0 (monomer), but AFm behaves differently. Here, AFm provides a direct score to evaluate the quality of the interaction (ipTM or Ranking Confidence). The authors should use these to separate good/bad models (for global/local docking), or at least show that these scores are less good than the one they used.

      We thank the reviewers for this suggestion.

      Reviewer #2 (Recommendations For The Authors):

      Some Figures could be skipped/improved

      Fig 1: Use TM-score instead a much better measure (and the figure is not necessary).

      Figure 1 compares the bias of AlphaFold towards unbound or bound forms of the proteins. We believe that this figure highlights the slight inherent bias of AlphaFold towards bound structures over unbound.

      As the reviewers have suggested we have included a plot comparing the TM-scores for the structures. Further, we have moved this figure to the Supplementary.

      Fig 2. Skip B (why compare RMSD with pLDDT?). Add a figure to see how this correlates over all targets not just two.

      RMSD and LDDT both represent metrics to evaluate conformational variability between two structures, such as the bound and unbound forms of the same protein structure. On one hand where RMSD measures overall deviation of residues, LDDT allows the estimation of relative domain orientations and concerted proteins. We have elaborated this in Methods as well as in the Results section titled “AlphaFold pLDDT provides a predictive confidence measure for backbone flexibility”.

      The data for the benchmark targets is now included in the Supplementary (Supplementary Figures S3-S4).

      Fig 3. Color the different chains of a protein differently. Thereby the Receptor/Ligand/Bound labels can be omitted.

      We thank the reviewers for this suggestion. However, the color scheme is chosen to highlight (1) the relative orientation of protein partners relative to each other. We have ensured that the alignment is over one partner (Receptor) so that you could see the relative orientation of the other partner (Ligand) in the modeled protein over the bound structure (in one color). (2) The coloring of the receptor and ligand chain is by pLDDT (from red to blue) to highlight that for decoys with incorrectly predicted interfaces, the pLDDT scores of the interface residues are indeed lower and can be a discriminating metric. We elaborate this in the caption of Figure 3 as well as in the section “Interface-pLDDT correlates with DockQ and discriminates poorly docked structures”. Coloring the chains of a protein differently will obfuscate the point that we are aiming to make and will be inconclusive for the readers as they would need to rely only on quantitative metrics (Irms and DockQ) reported but won’t be able to visualize the interface pLDDT of the incorrectly bound structures. We hope that this justifies the choice of our color scheme.

      Fig 4. Include RankConf, ipTM, pDockQ, and other measures in the plos (they are likely better). Include DockQ for the top targets. It is difficult to estimate for multi chain complexes.

      We thank the reviewer for this suggestion. We have now included the DockQ performances for all targets in Figure 5 (previously Figure 6) as well as re-evaluated our final success rates based on the DockQ calculations in Figure 8 (previously Figure 9).

      Fig 5. use a better measure to split (see above).

      We have elaborated on the choice of the split for the comments above and the interface pLDDT threshold of 85 is a decision made post observation on the docking benchmark. We do want to highlight that the cut-off is arbitrary and in our online server (ROSIE) as well as in custom scripts, this cut-off can be tuned by the user as required. We would suggest a cut-off of 85 based on our observations but the users are welcome to tune this as per their needs.

      Fig 6. Replace lrms/fnat with DockQ.

      We have now included DockQ scores in our manuscript.

      Fig 7. Color the different chains of a protein differently.

      We have colored the protein chains differently. AlphaFold models are in Orange, Bound complexes are in Gray, and predicted proteins from AlphaRED are in Blue-Green indicating the two partners. All models are aligned over the receptor so relative orientations of the ligand protein can be observed.

      Fig 8 Color the different chains of a protein differently.

      The chains are colored differently. We would like the reviewer to elaborate more on what they would like to observe as we believe our color scheme makes intuitive sense for readers.

      Fig 9. Use DockQ instead of CAPRI criteria.

      The figure has been updated based on DockQ. To elaborate, the CAPRI criteria is set based on DockQ scores as elaborated in the figure caption.

    1. eLife Assessment

      This manuscript reports important findings that the methyltransferase METTL3 is involved in the repair of abasic sites and uracil in DNA, mediating resistance to floxuridine-driven cytotoxicity. Convincing evidence shows the involvement of m6A in DNA based on single cell imaging and mass spec data. The authors present evidence that the m6A signal does not result from bacterial contamination or RNA, but the text does not make this overly clear.

    2. Reviewer #1 (Public review):

      Summary:

      The authors sought to identify unknown factors involved in the repair of uracil in DNA through a CRISPR knockout screen.

      Strengths:

      The screen identified both known and unknown proteins involved in DNA repair resulting from uracil or modified uracil base incorporation into DNA. The conclusion is that the protein activity of METTL3, which converts A nucleotides to 6mA nucleotides, plays a role in the DNA damage/repair response. The importance of METTL3 in DNA repair, and its colocalization with a known DNA repair enzyme, UNG2, is well characterized.

      Weaknesses:

      This reviewer identified no major weaknesses in this study. The manuscript could be improved by tightening the text throughout, and more accurate and consistent word choice around the origin of U and 6mA in DNA. The dUTP nucleotide is misincorporated into DNA, and 6mA is formed by methylation of the A base present in DNA. Using words like 6mA "deposition in DNA" seems to imply it results from incorporation of a methylated dATP nucleotide during DNA synthesis.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors performed a CRISPR knockout screen in the presence of floxuridine, a chemotherapeutic agent that incorporates uracil and fluoro-uracil into DNA, and identified unexpected factors, such as the RNA m6A methyltransferase METTL3, as required to overcome floxuridine-driven cytotoxicity in mammalian cells. Interestingly, the observed N6-methyladenosine was embedded in DNA, which has been reported as DNA 6mA in mammalian genomes and is currently confirmed with mass spectrometry in this model. Therefore, this work consolidated the functional role of mammalian genomic DNA 6mA, and supported with solid evidence to uncover the METTL3-6mA-UNG2 axis in response to DNA base damage.

      Strengths:

      In this work, the authors took an unbiased, genome-wide CRISPR approach to identify novel factors involved in uracil repair with potential clinical interest.

      The authors designed elegant experiments to confirm the METTL3 works through genomic DNA, adding the methylation into DNA (6mA) but not the RNA (m6A), in this base damage repair context. The authors employ different enzymes, such as RNase A, RNase H, DNase, and liquid chromatography coupled to tandem mass spectrometry to validate that METTL3 deposits 6mA in DNA in response to agents that increase genomic uracil.

      They also have the Mettl3-KO and the METTL3 inhibition results to support their conclusion.

      Weaknesses:

      Although this study demonstrates that METTL3-dependent 6mA deposition in DNA is functionally relevant to DNA damage repair in mammalian cells, there are still several concerns and issues that need to be improved to strengthen this research.

      First, in the whole paper, the authors never claim or mention the mammalian cell lines contamination testing result, which is the fundamental assay that has to be done for the mammalian cell lines DNA 6mA study.

      Second, in the whole work, the authors have not supplied any genomic sequencing data to support their conclusions. Although the sequencing of DNA 6mA in mammalian models is challenging, recent breakthroughs in sequencing techniques, such as DR-Seq or NT/NAME-seq, have lowered the bar and improved a lot in the 6mA sequencing assay. Therefore, the authors should consider employing the sequencing methods to further confirm the functional role of 6mA in base repair.

      Third, the authors used the METTL3 inhibitor and Mettl3-KO to validate the METTL3-6mA-UNG2 functional roles. However, the catalytic mutant and rescue of Mettl3 may be the further experiments to confirm the conclusion.

    4. Reviewer #3 (Public review):

      Summary:

      The authors are showing evidence that they claim establishes the controversial epigenetic mark, DNA 6mA, as promoting genome stability.

      Strengths:

      The identification of a poorly understood protein, METTL3, and its subsequent characterization in DDR is of high quality and interesting.

      Weaknesses:

      (1) The very presence of 6mA (DNA) in mammalian DNA is still highly controversial and numerous studies have been conclusively shown to have reported the presence of 6mA due to technical artifacts and bacterial contamination. Thus, to my knowledge there is no clear evidence for 6mA as an epigenetic mark in mammals, and consequently, no evidence of writers and readers of 6mA. None of this is mentioned in the introduction. Much of the introduction can be reduced, but a paragraph clearly stating the controversy and lack of evidence for 6mA in mammals needs to be added, otherwise, the reader is given an entirely distorted view of the field.

      These concerns must also be clearly in the limitations section and even in the results section which fails to nuance the authors' findings.

      (2) What is the motivation for using HT-29 cells? Moreover, the materials and methods do not state how the authors controlled for bacterial contamination, which has been the most common cause of erroneous 6mA signals to date. Did the authors routinely check for mycoplasma?

      (3) The single cell imaging of 6mA in various cells is nice. The results are confirmed by mass spec as an orthogonal approach. Another orthogonal and quantitative approach to assessing 6mA levels would be PacBio. Similarly, it is unclear why the authors have not performed dot-blots of 6mA for genomic DNA from the given cell lines.

      (4) The results of Figure 3 need further investigation and validation. If the results are correct the authors are suggesting that the majority of 6mA in their cell lines is present in the DNA, and not the RNA, which is completely contrary to every other study of 6mA in mammalian cells that I am aware of. This could suggest that the antibody is not, in fact, binding to 6mA, but to unmodified adenine, which would explain why the signal disappears after DNAse treatment. Indeed, binding of 6mA to unmethylated DNA is a commonly known problem with most 6mA antibodies and is well described elsewhere.

      (5) Given the lack of orthologous validation of the observed DNA 6mA and the lack of evidence supporting the presence of 6mA in mammalian DNA and consequently any functional role for 6mA in mammalian biology, the manuscript's conclusions need to be toned down significantly, and the inherent difficulty in assessing 6mA accurately in mammals acknowledged throughout.

    5. Author response:

      eLife Assessment <br /> This manuscript reports important findings that the methyltransferase METTL3 is involved in the repair of abasic sites and uracil in DNA, mediating resistance to floxuridine-driven cytotoxicity. The presented evidence for the involvement of m6A in DNA is incomplete and requires further validation with orthogonal approaches to conclusively show the presence of 6mA in the DNA and exclude that the source is RNA or bacterial contamination. 

      We thank the editors for recognizing the importance of our work and the relevance of METTL3 in DNA repair. However, we wholly disagree with the second sentence in the eLife assessment, and we want to clarify why our evidence for the involvement of 6mA in DNA is complete.  

      The identification of 6mA in DNA, upon DNA damage, is based first on immunofluorescence observations using an anti-m6A antibody. In this setting, removal of RNA with RNase treatment fails to reduce the 6mA signal, excluding the possibility that the source of signal is RNA. In contrast, removal of DNA with DNase treatment removes all 6mA signal, strongly suggesting that the species carrying the N6-methyladenosine modification is DNA (Figure 3D, E). Importantly, in Figure 3F, we provide orthogonal, quantitative mass spectrometry data that independently confirm this finding. Mass spectrometry-liquid chromatography of DNA analytes, conclusively shows the presence of 6mA in DNA upon treatment with DNA damaging agents and excludes that the source is RNA, based on exact mass. Reviewer #2 recognized the strengths of this approach to generate solid evidence for 6mA in DNA.

      Cells only show the 6mA signal when treated with DNA damaging agents, and the 6mA is absent from untreated cells (Figure 3D, E, F). This provides strong evidence that the 6mA signal is not a result of bacterial contamination in our cell lines. Moreover, our cell lines are routinely tested for mycoplasma contamination. It could be possible that stock solutions of DNA damaging agents may be contaminated, but this would need to be true for all individual drugs and stocks tested. The data showing 6mA signal is not significantly different from untreated cells when a DNA damaging agent is combined with a METTL3 inhibitor (Figure 3G, H) provides strong evidence against bacterial contamination in our stocks.  

      In summary, we provide conclusive evidence, based on orthogonal methods, that the METTL3-dependent N6-methyladenosine modification is deposited in DNA, not RNA, in response to DNA damage. 

      Public Reviews: <br /> Reviewer #1 (Public review): <br /> Summary: 

      The authors sought to identify unknown factors involved in the repair of uracil in DNA through a CRISPER knockout screen. 

      Typo above: “CRISPER” should be “CRISPR”.

      Strengths: 

      The screen identified both known and unknown proteins involved in DNA repair resulting from uracil or modified uracil base incorporation into DNA. The conclusion is that the protein activity of METTL3, which converts A nucleotides to 5mA nucleotides, plays a role in the DNA damage/repair response. The importance of METTL3 in DNA repair, and its colocalization with a known DNA repair enzyme, UNG2, is well characterized. 

      Typo above: “5mA” should be “6mA”.

      Weaknesses: <br /> This reviewer identified no major weaknesses in this study. The manuscript could be improved by tightening the text throughout, and more accurate and consistent word choice around the origin of U and 6mA in DNA. The dUTP nucleotide is misincorporated into DNA, and 6mA is formed by methylation of the A base present in DNA. Using words like 6mA "deposition in DNA" seems to imply it results from incorporation of a methylated dATP nucleotide during DNA synthesis.

      The increased presence of 6mA during DNA damage could result from methylation at the A base itself (within DNA) or from incorporation of pre-modified 6mA during DNA synthesis. Our data do not directly discriminate between these two mechanisms, and we will clarify this point in the discussion.

      Reviewer #2 (Public review): <br /> Summary: <br /> In this work, the authors performed a CRISPR knockout screen in the presence of floxuridine, a chemotherapeutic agent that incorporates uracil and fluoro-uracil into DNA, and identified unexpected factors, such as the RNA m6A methyltransferase METTL3, as required to overcome floxuridine-driven cytotoxicity in mammalian cells. Interestingly, the observed N6-methyladenosine was embedded in DNA, which has been reported as DNA 6mA in mammalian genomes and is currently confirmed with mass spectrometry in this model. Therefore, this work consolidated the functional role of mammalian genomic DNA 6mA, and supported with solid evidence to uncover the METTL3-6mA-UNG2 axis in response to DNA base damage. <br /> Strengths: <br /> In this work, the authors took an unbiased, genome-wide CRISPR approach to identify novel factors involved in uracil repair with potential clinical interest. 

      The authors designed elegant experiments to confirm the METTL3 works through genomic DNA, adding the methylation into DNA (6mA) but not the RNA (m6A), in this base damage repair context. The authors employ different enzymes, such as RNase A, RNase H, DNase, and liquid chromatography coupled to tandem mass spectrometry to validate that METTL3 deposits 6mA in DNA in response to agents that increase genomic uracil. <br /> They also have the Mettl3-KO and the METTL3 inhibition results to support their conclusion. <br /> Weaknesses:<br /> Although this study demonstrates that METTL3-dependent 6mA deposition in DNA is functionally relevant to DNA damage repair in mammalian cells, there are still several concerns and issues that need to be improved to strengthen this research.

      First, in the whole paper, the authors never claim or mention the mammalian cell lines contamination testing result, which is the fundamental assay that has to be done for the mammalian cell lines DNA 6mA study.

      Our cell lines are routinely tested for bacterial contamination, specifically mycoplasma, and we plan to state this information in a revised version of the manuscript.

      Importantly, we do not observe 6mA in untreated cells, strongly suggesting that the 6mA signal observed is dependent on the presence of DNA damage and not caused by contamination in the cell lines (Figure 3D, E, F). While it could be possible that stock solutions of DNA damaging agents may be contaminated, this would need to be the case for all individual drugs and stocks tested that induce 6mA, which seems very unlikely. Finally, the data showing 6mA signal is not significantly different from untreated cells when a DNA damaging agent is combined with a METTL3 inhibitor (Figure 3 G, H) provides strong evidence against bacterial contamination in our drug stocks.

      Second, in the whole work, the authors have not supplied any genomic sequencing data to support their conclusions. Although the sequencing of DNA 6mA in mammalian models is challenging, recent breakthroughs in sequencing techniques, such as DR-Seq or NT/NAME-seq, have lowered the bar and improved a lot in the 6mA sequencing assay. Therefore, the authors should consider employing the sequencing methods to further confirm the functional role of 6mA in base repair.

      While we agree that it could be important to understand the precise genomic location of 6mA in relation to DNA damage, this is outside the scope of the current study. Moreover, this exercise may prove unproductive. If 6mA is enriched in DNA at damage sites or as DNA is replicated, the genomic mapping of 6mA is likely to be stochastic. If stochastic, it would be impossible to obtain the read depth necessary to map 6mA accurately.

      Third, the authors used the METTL3 inhibitor and Mettl3-KO to validate the METTL3-6mA-UNG2 functional roles. However, the catalytic mutant and rescue of Mettl3 may be the further experiments to confirm the conclusion. 

      We believe this to be an excellent suggestion from Reviewer #2 but we are unable to perform the proposed experiment at this time. We encourage future studies to explore the rescue experiment.

      Reviewer #3 (Public review):

      Summary:

      The authors are showing evidence that they claim establishes the controversial epigenetic mark, DNA 6mA, as promoting genome stability.

      Strengths:

      The identification of a poorly understood protein, METTL3, and its subsequent characterization in DDR is of high quality and interesting.

      Weaknesses:

      (1) The very presence of 6mA (DNA) in mammalian DNA is still highly controversial and numerous studies have been conclusively shown to have reported the presence of 6mA due to technical artifacts and bacterial contamination. Thus, to my knowledge there is no clear evidence for 6mA as an epigenetic mark in mammals, and consequently, no evidence of writers and readers of 6mA. None of this is mentioned in the introduction. Much of the introduction can be reduced, but a paragraph clearly stating the controversy and lack of evidence for 6mA in mammals needs to be added, otherwise, the reader is given an entirely distorted view of the field.

      These concerns must also be clearly in the limitations section and even in the results section which fails to nuance the authors' findings.

      We agree with the reviewer that the presence and potential function of 6mA in mammalian DNA has been debated. Importantly, the debate regarding the presence and quantity of 6mA in DNA has been previously restricted to undamaged, baseline conditions. In complete agreement with this notion, we do not detect appreciable levels of 6mA in untreated cells. We will revise the introduction to introduce the debate about 6mA in DNA. We, however, want to highlight that our study provides for the first time, convincing evidence (based on orthogonal methods) that 6mA is present in DNA in response to a stimulus, DNA damage.

      (2) What is the motivation for using HT-29 cells? Moreover, the materials and methods do not state how the authors controlled for bacterial contamination, which has been the most common cause of erroneous 6mA signals to date. Did the authors routinely check for mycoplasma?

      HT-29 is a cell line of colorectal origin and chemotherapeutic agents that introduce uracil and uracil derivatives in DNA, as those used in this study, are relevant for the treatment of colorectal cancer. As indicated above, we do not observe 6mA in untreated cells, strongly suggesting that the 6mA signal observed is dependent on DNA damage and not caused by a potential bacterial contamination (Figure 3D, E, F). Additionally, our cell lines are routinely tested for bacterial contamination, specifically mycoplasma.

      (3) The single-cell imaging of 6mA in various cells is nice but must be confirmed by orthogonal approaches. PacBio would provide an alternative and quantitative approach to assessing 6mA levels. Similarly, it is unclear why the authors have not performed dot-blots of 6mA for genomic DNA from the given cell lines.

      We are confused by this point since an orthogonal approach to detect 6mA, mass spectrometry-liquid chromatography, was employed. This method does not use an antibody and confirms the increase of 6mA in DNA when cells were treated with DNA damaging agents. This data is presented in Figure 3F.

      It is sensible to hypothesize that the localization of 6mA is consistent with DNA replication (like uracil deposition). In this event, the genomic mapping of 6mA is likely to be stochastic. This would make quantification with PacBio sequencing difficult because it would be very challenging to achieve the appropriate read depth to call a modified base.

      Dot blots rely on an antibody and thus are not truly orthogonal to our immunofluorescence-based measurements. We preferred the mass spectrometry-liquid chromatography approach we took as a true orthogonal approach.

      (4) The results of Figure 3 need further investigation and validation. If the results are correct the authors are suggesting that the majority of 6mA in their cell lines is present in the DNA, and not the RNA, which is completely contrary to every other study of 6mA in mammalian cells that I am aware of. This could suggest that the antibody is not, in fact, binding to 6mA, but to unmodified adenine, which would explain why the signal disappears after DNAse treatment. Indeed, binding of 6mA to unmethylated DNA is a commonly known problem with most 6mA antibodies and is well described elsewhere.

      Based on this and the following comment, we are convinced that Reviewer #3 has overlooked two critical elements of our study:

      First, the immunofluorescence work presented in Figure 3, showing 6mA signal in response to DNA damage, uses cells that were pre-extracted to remove excess cytoplasmic RNA. This method is often used in immunofluorescence experiments of this kind. The pre-extraction method removes most of the cytoplasmic content, and the majority of the cytoplasmic m6A RNA signal. Supplementary Figure 3D shows cells that have not been pre-extracted prior to staining. These images show the cytoplasmic m6A signal is abundant if we do not perform the pre-extraction step.

      If the antibody used to label 6mA significantly reacted with unmodified adenine, we would expect a large signal in untreated or untreated and denatured conditions. In contrast, an increase in 6mA is not observed in either case.

      Second, the orthogonal approach we employed, mass spectrometry coupled with liquid chromatography, measures 6mA DNA analytes specifically by exact mass. This approach does not depend on an antibody and yields results consistent with those from the immunofluorescence experiments.

      (5) Given the lack of orthologous validation of the observed DNA 6mA and the lack of evidence supporting the presence of 6mA in mammalian DNA and consequently any functional role for 6mA in mammalian biology, the manuscript's conclusions need to be toned down significantly, and the inherent difficultly in assessing 6mA accurately in mammals acknowledged throughout.

      Typo above: “difficultly” should be “difficulty”.

      As discussed in response to prior comments, Figure 3 does provide two independent and orthologous methods that demonstrate 6mA presence in DNA specifically, and not RNA, in response to DNA damage. Complementary and orthogonal datasets are presented using either immunofluorescence microscopy or mass spectrometry-liquid chromatography of extracted DNA. The latter method does not rely on an antibody and can discriminate 6mA DNA versus RNA based on exact mass. We will revise the text to clarify that Figure 3F is a completely orthogonal approach.

    1. eLife Assessment

      This valuable study marks a significant advancement in brain aging research by centering on Asian populations (Chinese, Malay, and Indian Singaporeans), a group frequently underrepresented in such studies. It unveils solid evidence for anatomical differences in brain aging predictors between the young and old age groups. Overall, this study broadens our understanding of brain aging across diverse ethnicities.

    2. Joint Public Review:

      Summary:

      The authors of the study investigated the generalization capabilities of a deep learning brain age model across different age groups within the Singaporean population, encompassing both elderly individuals aged 55 to 88 years and children aged 4 to 11 years. The model, originally trained on a dataset primarily consisting of Caucasian adults, demonstrated a varying degree of adaptability across these age groups. For the elderly, the authors observed that the model could be applied with minimal modifications, whereas for children, significant fine-tuning was necessary to achieve accurate predictions. Through their analysis, the authors established a correlation between changes in the brain age gap and future executive function performance across both demographics. Additionally, they identified distinct neuroanatomical predictors for brain age in each group: lateral ventricles and frontal areas were key in elderly participants, while white matter and posterior brain regions played a crucial role in children. These findings underscore the authors' conclusion that brain age models hold the potential for generalization across diverse populations, further emphasizing the significance of brain age progression as an indicator of cognitive development and aging processes.

      Strengths:

      (1) The study tackles a crucial research gap by exploring the adaptability of a brain age model across Asian demographics (Chinese, Malay, and Indian Singaporeans), enriching our knowledge of brain aging beyond Western populations.<br /> (2) It uncovers distinct anatomical predictors of brain aging between elderly and younger individuals, highlighting a significant finding in the understanding of age-related changes and ethnic differences.

      In summary, this paper underscores the critical need to include diverse ethnicities in model testing and estimation.

      Comments on revisions:

      The previously mentioned weaknesses were addressed in the revision process. As stated earlier the paper tackles a crucial research gap by exploring the adaptability of a brain-age model across Asian demographics (Chinese, Malay, and Indian Singaporeans), enriching our knowledge of brain aging beyond Western populations.

    1. eLife Assessment

      This valuable study examines the variability in spacing and direction of entorhinal grid cells, providing convincing evidence that such variability helps disambiguate locations within an environment. This study will be of interest to neuroscientists working on spatial navigation and, more broadly, on neural coding.

    2. Reviewer #1 (Public review):

      Summary:

      The present paper by Redman et al. investigated the variability of grid cell properties in the MEC by analyzing publicly available large-scale neural recording data. Although previous studies have proposed that grid spacing and orientation are homogeneous within the same grid module, the authors found a small but robust variability in grid spacing and orientation across grid cells in the same module. The authors also showed, through model simulations, that such variability is useful for decoding spatial position.

      Strengths:

      The results of this study provide novel and intriguing insights into how grid cells compose the cognitive map in the axis of the entorhinal cortex and hippocampus. This study analyzes large data sets in an appropriate manner and the results are convincing.

      Comments on revisions:

      In the revised version of the manuscript, the authors have addressed all the concerns I raised.

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents an interesting and useful analysis of grid cell heterogeneity, showing that the experimentally observed heterogeneity of spacing and orientation within a grid cell module can allow more accurate decoding of location from a single module.

      Strengths:

      (1) I found the statistical analysis of the grid cell variability to be very systematic and convincing. I also found the evidence for enhanced decoding of location based on between cell variability within a module to be convincing and important, supporting their conclusions.

      (2) Theoreticians have developed models that focus on the use of grid cells that are highly regular in their parameters, and usually vary only in the spatial phase of cells within modules and the spacing and orientation between modules. This focus on consistency is partly to obtain the generalization of the grid cell code to a broad range of previously unvisited locations. In contrast, most experimentalists working with grid cells know that many if not most grid cells show high variability of firing fields, as demonstrated in the figures in experimental papers. The authors of this current paper have highlighted this discrepancy, and shown that the variability shown in the data could actually enhance decoding of location.

    4. Reviewer #3 (Public review):

      Summary:

      Redman and colleagues analyze grid cell data obtained from public databases. They show that there is significant variability in spacing and orientation within a module. They show that the difference in spacing and orientation for a pair of cells is larger than the one obtained for two independent maps of the same cell. They speculate that this variability could be useful to disambiguate the rat position if only information from a single module is used by a decoder.

      Strengths:

      The strengths of this work lie in its conciseness, clarity, and the potential significance of its findings for the grid cell community, which has largely overlooked this issue for the past two decades. Their hypothesis is well stated and the analyses are solid.

      Weaknesses:

      Major weaknesses identified in the original version have been addressed.

      The authors have addressed all of our concerns, providing control analyses that strengthen their claim.

    1. eLife Assessment

      This important study reports a detailed quantification of the population dynamics of Salmonella enterica serovar Typhimurium in mice. Bacterial burden and founding population sizes across various organs were quantified, revealing pathways of dissemination and reseeding of the gastrointestinal tract from systemic organs. Using various techniques, including genetic distance measurements, the authors present compelling evidence to support their conclusions, thus presenting new knowledge that will be of broad interest to scientists focusing on infectious diseases.

    2. Reviewer #1 (Public review):

      Hotinger et al. explore the population dynamics of Salmonella enterica serovar Typhimurium in mice using genetically tagged bacteria. In addition to physiological observations, pathology assessments, and CFU measurements, the study emphasizes quantifying host bottleneck sizes that limit Salmonella colonization and dissemination. The authors also investigate the genetic distances between bacterial populations at various infection sites within the host.

      Initially, the study confirms that pretreatment with the antibiotic streptomycin before inoculation via orogastric gavage increases the bacterial burden in the gastrointestinal (GI) tract, leading to more severe symptoms and heightened fecal shedding of bacteria. This pretreatment also significantly reduces between-animal variation in bacterial burden and fecal shedding. The authors then calculate founding population sizes across different organs, discovering a severe bottleneck in the intestine, with founding populations reduced by approximately 10^6-fold compared to the inoculum size. Streptomycin pretreatment increases the founding population size and bacterial replication in the GI tract. Moreover, by calculating genetic distances between populations, the authors demonstrate that, in untreated mice, Salmonella populations within the GI tract are genetically dissimilar, suggesting limited exchange between colonization sites. In contrast, streptomycin pretreatment reduces genetic distances, indicating increased exchange.

      In extraintestinal organs, the bacterial burden is generally not substantially increased by streptomycin pretreatment, with significant differences observed only in the mesenteric lymph nodes and bile. However, the founding population sizes in these organs are increased. By comparing genetic distances between organs, the authors provide evidence that subpopulations colonizing extraintestinal organs diverge early after infection from those in the GI tract. This hypothesis is further tested by measuring bacterial burden and founding population sizes in the liver and GI tract at 5 and 120 hours post-infection. Additionally, they compare orogastric gavage infection with the less injurious method of infection via drinking, finding similar results for CFUs, founding populations, and genetic distances. These results argue against injuries during gavage as a route of direct infection.

      To bypass bottlenecks associated with the GI tract, the authors compare intravenous (IV) and intraperitoneal (IP) routes of infection. They find approximately a 10-fold increase in bacterial burden and founding population size in immune-rich organs with IV/IP routes compared to orogastric gavage in streptomycin-pretreated animals. This difference is interpreted as a result of "extra steps required to reach systemic organs."

      While IP and IV routes yield similar results in immune-rich organs, IP infections lead to higher bacterial burdens in nearby sites, such as the pancreas, adipose tissue, and intraperitoneal wash, as well as somewhat increased founding population sizes. The authors correlate these findings with the presence of white lesions in adipose tissue. Genetic distance comparisons reveal that, apart from the spleen and liver, IP infections lead to genetically distinct populations in infected organs, whereas IV infections generally result in higher genetic similarity.

      Finally, the authors investigate GI tract reseeding, identifying two distinct routes. They observe that the GI tracts of IP/IV-infected mice are colonized either by a clonal or a diversely tagged bacterial population. In clonally reseeded animals, the genetic distance within the GI tract is very low (often zero) compared to the bile population, which is predominantly clonal or pauciclonal. These animals also display pathological signs, such as cloudy/hardened bile and increased bacterial burden, leading the authors to conclude that the GI tract was reseeded by bacteria from the gallbladder bile. In contrast, animals reseeded by more complex bacterial populations show that bile contributes only a minor fraction of the tags. Given the large founding population size in these animals' GI tracts, which is larger than in orogastrically infected animals, the authors suggest a highly permissive second reseeding route, largely independent of bile. They speculate that this route may involve a reversal of known mechanisms that the pathogen uses to escape from the intestine.

      The manuscript presents a substantial body of work that offers a meticulously detailed understanding of the population dynamics of S. Typhimurium in mice. It quantifies the processes shaping the within-host dynamics of this pathogen and provides new insights into its spread, including previously unrecognized dissemination routes. The methodology is appropriate and carefully executed, and the manuscript is well-written, clearly presented, and concise. The authors' conclusions are well-supported by experimental results and thoroughly discussed. This work underscores the power of using highly diverse barcoded pathogens to uncover the within-host population dynamics of infections and will likely inspire further investigations into the molecular mechanisms underlying the bottlenecks and dissemination routes described here.

    3. Reviewer #2 (Public review):

      In this paper, Hotinger et. al. propose an improved barcoded library system, called STAMPR, to study Salmonella population dynamics during infection. Using this system, the authors demonstrate significant diversity in the colonization of different Salmonella clones (defined by the presence of different barcodes) not only across different organs (liver, spleen, adipose tissues, pancreas and gall bladder) but also within different compartments of the same gastrointestinal tissue. Additionally, this system revealed that microbiota competition is the major bottleneck in Salmonella intestinal colonization, which can be mitigated by streptomycin treatment. However, this has been demonstrated previously in numerous publications. They also show that there was minimal sharing between populations found in the intestine and those in the other organs. Upon IV and IP infection to bypass the intestinal bottleneck, they were able to demonstrate, using this library, that Salmonella can renter the intestine through two possible routes. One route is essentially the reverse path used to escape the gut, leading to a diverse intestinal population; while the other, through the bile, typically results in a clonal population.

      Comments on latest version:

      The authors have addressed my concerns.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Hotinger et al. explore the population dynamics of Salmonella enterica serovar Typhimurium in mice using genetically tagged bacteria. In addition to physiological observations, pathology assessments, and CFU measurements, the study emphasizes quantifying host bottleneck sizes that limit Salmonella colonization and dissemination. The authors also investigate the genetic distances between bacterial populations at various infection sites within the host.

      Initially, the study confirms that pretreatment with the antibiotic streptomycin before inoculation via orogastric gavage increases the bacterial burden in the gastrointestinal (GI) tract, leading to more severe symptoms and heightened fecal shedding of bacteria. This pretreatment also significantly reduces between-animal variation in bacterial burden and fecal shedding. The authors then calculate founding population sizes across different organs, discovering a severe bottleneck in the intestine, with founding populations reduced by approximately 10^6-fold compared to the inoculum size. Streptomycin pretreatment increases the founding population size and bacterial replication in the GI tract. Moreover, by calculating genetic distances between populations, the authors demonstrate that, in untreated mice, Salmonella populations within the GI tract are genetically dissimilar, suggesting limited exchange between colonization sites. In contrast, streptomycin pretreatment reduces genetic distances, indicating increased exchange.

      In extraintestinal organs, the bacterial burden is generally not substantially increased by streptomycin pretreatment, with significant differences observed only in the mesenteric lymph nodes and bile. However, the founding population sizes in these organs are increased. By comparing genetic distances between organs, the authors provide evidence that subpopulations colonizing extraintestinal organs diverge early after infection from those in the GI tract. This hypothesis is further tested by measuring bacterial burden and founding population sizes in the liver and GI tract at 5 and 120 hours post-infection. Additionally, they compare orogastric gavage infection with the less injurious method of infection via drinking, finding similar results for CFUs, founding populations, and genetic distances. These results argue against injuries during gavage as a route of direct infection. 

      To bypass bottlenecks associated with the GI tract, the authors compare intravenous (IV) and intraperitoneal (IP) routes of infection. They find approximately a 10-fold increase in bacterial burden and founding population size in immune-rich organs with IV/IP routes compared to orogastric gavage in streptomycin-pretreated animals. This difference is interpreted as a result of "extra steps required to reach systemic organs."

      While IP and IV routes yield similar results in immune-rich organs, IP infections lead to higher bacterial burdens in nearby sites, such as the pancreas, adipose tissue, and intraperitoneal wash, as well as somewhat increased founding population sizes. The authors correlate these findings with the presence of white lesions in adipose tissue. Genetic distance comparisons reveal that, apart from the spleen and liver, IP infections lead to genetically distinct populations in infected organs, whereas IV infections generally result in higher genetic similarity. 

      Finally, the authors investigate GI tract reseeding, identifying two distinct routes. They observe that the GI tracts of IP/IV-infected mice are colonized either by a clonal or a diversely tagged bacterial population. In clonally reseeded animals, the genetic distance within the GI tract is very low (often zero) compared to the bile population, which is predominantly clonal or pauciclonal. These animals also display pathological signs, such as cloudy/hardened bile and increased bacterial burden, leading the authors to conclude that the GI tract was reseeded by bacteria from the gallbladder bile. In contrast, animals reseeded by more complex bacterial populations show that bile contributes only a minor fraction of the tags. Given the large founding population size in these animals' GI tracts, which is larger than in orogastrically infected animals, the authors suggest a highly permissive second reseeding route, largely independent of bile. They speculate that this route may involve a reversal of known mechanisms that the pathogen uses to escape from the intestine. 

      The manuscript presents a substantial body of work that offers a meticulously detailed understanding of the population dynamics of S. Typhimurium in mice. It quantifies the processes shaping the within-host dynamics of this pathogen and provides new insights into its spread, including previously unrecognized dissemination routes. The methodology is appropriate and carefully executed, and the manuscript is well-written, clearly presented, and concise. The authors' conclusions are well-supported by experimental results and thoroughly discussed. This work underscores the power of using highly diverse barcoded pathogens to uncover the within-host population dynamics of infections and will likely inspire further investigations into the molecular mechanisms underlying the bottlenecks and dissemination routes described here.

      Major point:

      Substantial conclusions in the manuscript rely on genetic distance measurements using the Cavalli-Sforza chord distance. However, it is unclear whether these genetic distance measurements are independent of the founding population size. I would anticipate that in populations with larger founding population sizes, where the relative tag frequencies are closer to those in the inoculum, the genetic distances would appear smaller compared to populations with smaller founding sizes independent of their actual relatedness. This potential dependency could have implications for the interpretation of findings, such as those in Figures 2B and 2D, where antibiotic-pretreated animals consistently exhibit higher founding population sizes and smaller genetic distances compared to untreated animals.

      Thank you for raising this important point regarding reliance on cord distances for gauging genetic distance in barcoded populations. The reviewer is correct that samples with more founders will be more similar to the inoculum and thus inherently more similar to other samples that also have more founders. However, creation of libraries containing very large numbers of unique barcodes can often circumvent this issue. In this case, the effect size of chance-based similarity is not large enough to change the interpretation of the data in Figures 2B and 2D. In our case, the library has ~6x10<sup>4</sup> barcodes, and the founding populations in Figure 2B are ~10<sup>3</sup>. Randomly resampling to create two populations of 10<sup>3</sup> cells from an initial population with 6x10<sup>4</sup> barcodes is expected to yield largely distinct populations with very little similarity. Thus, the similarity between streptomycin-treated populations in Figure 2D is likely the result of biology rather than chance.  

      Reviewer #2 (Public review):

      In this paper, Hotinger et. al. propose an improved barcoded library system, called STAMPR, to study Salmonella population dynamics during infection. Using this system, the authors demonstrate significant diversity in the colonization of different Salmonella clones (defined by the presence of different barcodes) not only across different organs (liver, spleen, adipose tissues, pancreas, and gall bladder) but also within different compartments of the same gastrointestinal tissue. Additionally, this system revealed that microbiota competition is the major bottleneck in Salmonella intestinal colonization, which can be mitigated by streptomycin treatment. However, this has been demonstrated previously in numerous publications. They also show that there was minimal sharing between populations found in the intestine and those in the other organs. Upon IV and IP infection to bypass the intestinal bottleneck, they were able to demonstrate, using this library, that Salmonella can renter the intestine through two possible routes. One route is essentially the reverse path used to escape the gut, leading to a diverse intestinal population; while the other, through the bile, typically results in a clonal population. Although the authors showed that the STAMPR pipeline improved the ability to identify founder populations and their diversity within the same animal during infections, some of the conclusions appear speculative and not fully supported.

      (1) It's particularly interesting how the authors, using this system, demonstrate the dominant role of the microbiota bottleneck in Salmonella colonization and how it is widened by antibiotic treatment (Figure 1). Additionally, the ability to track Salmonella reseeding of the gut from other organs starting with IV and IP injections of the pathogen provides a new tool to study population dynamics (Figure 5). However, I don't think it is possible to argue that the proximal and distal small intestine, Peyer's patches (PPs), cecum, colon, and feces have different founder populations for reasons other than stochastic variations. All the barcoded Salmonella clones have the same fitness and the fact that some are found or expanded in one region of the gastrointestinal tract rather than another likely results from random chance - such as being forced in a specific region of the gut for physical or spatial reasons-and subsequent expansion, rather than any inherent biological cause. For example, some bacteria may randomly adhere to the mucus, some may swim toward the epithelial layer, while others remain in the lumen; all will proliferate in those respective sites. In this way, different founder populations arise based on random localization during movement through the gastrointestinal tract, which is an observation, but it doesn't significantly contribute to understanding pathogen colonization dynamics or pathogenesis. Therefore, I would suggest placing less emphasis on describing these differences or better discussing this aspect, especially in the context of the gastrointestinal tract.

      Thank you for helping us identify this area for further clarification. We agree with the reviewer’s interpretation that seeding of proximal and distal small intestine, Peyer's patches (PPs), cecum, colon, and feces with different founder populations is likely caused by stochastic variations, consistent with separate stochastic bottlenecks to establishing these separate niches. To clarify this point we have modified the text in the results section, “Streptomycin treatment decreases compartmentalization of S. Typhimurium populations within the intestine”.

      Change to text:

      “Except for the cecum and colon, in untreated animals the S. Typhimurium populations in different regions of the intestine were dissimilar (Avg. GD ranged from 0.369 to 0.729, 2D left); i.e., there is little sharing between populations in the intestine. These data suggest that there are separate bottlenecks in different regions of the intestine that cause stochastic differences in the identity of the founders. Interestingly, when these founders replicate, they do not mix, remaining compartmentalized with little sharing between populations throughout the intestinal tract (i.e., barcodes found in one region are not in other regions, Figure S3). This was surprising as the luminal contents, an environment presumably conducive to bacterial movement, were not removed from these samples.”

      In this section we are interested in the underlying biology that occurs after the initial bottleneck to preserve this compartmentalization during outgrowth of the intestinal population. In other words, what prevents these separate populations from merging (e.g., what prevents the bacteria replicating in the proximal small intestine from traveling through the intestine and establishing a niche in the distal small intestine)? While we do not explore the mechanisms of compartmentalization, we observe that it is disrupted by streptomycin pretreatment, suggesting a microbiota-dependent biological cause. 

      (2) I do think that STAMPR is useful for studying the dynamics of pathogen spread to organs where Salmonella likely resides intracellularly (Figure 3). The observation that the liver is colonized by an early intestinal population, which continues to proliferate at a steady rate throughout the infection, is very interesting and may be due to the unique nature of the organ compared to the mucosal environment. What is the biological relevance during infection? Do the authors observe the same pattern (Figures 3C and G) when normalizing the population data for the spleen and mesenteric lymph nodes (mLN)? If not, what do the authors think is driving this different distribution?

      Thank you for raising this interesting point. These data indicate that the liver is seeded from the intestine early during infection. The timing and source of dissemination have relevance for understanding how host and pathogen variables control the spread of bacteria to systemic sites. For example, our conclusion (early dissemination) indicates that the immune state of a host at the time of exposure to a pathogen, and for a short period thereafter, are what primarily influence the process of dissemination, not the later response to an active infection. 

      We observe that the liver and mucosal environments within the intestine have similar colonization behaviors. Both niches are seeded early during infection, followed by steady pathogen proliferation and compartmentalization that apparently inhibits further seeding. This results in the identity of barcodes in the liver population remaining distinct from the intestinal populations, and the intestinal populations remaining distinct from each other.

      We observe a similar pattern to the liver in the spleen and MLN (the barcodes in the spleen and MLN are dissimilar to the population in the intestine). To clarify this point, we have modified the text (below) and added this analysis as a supplemental figure (S4).

      Change to text:

      Genetic distance comparison of liver samples to other sites revealed that, regardless of streptomycin treatment, there was very little sharing of barcodes between the intestine and extraintestinal sites (Avg. GD >0.75, Figure 3C). Furthermore, the MLN and spleen populations also lacked similarity with the intestine (Figure S4). These analyses strongly support the idea that S. Typhimurium disseminates to extraintestinal organs relatively early following inoculation, before it establishes a replicative niche in the intestine.

      (3) Figure 6: Could the bile pathology be due to increased general bacterial translocation rather than Salmonella colonization specifically? Did the authors check for the presence of other bacteria (potentially also proliferating) in the bile? Do the authors know whether Salmonella's metabolic activity in the bile could be responsible for gallbladder pathology?

      The reviewer raises interesting points for future work. We did not check whether other bacterial species are translocating during S. Typhimurium infection. The relevance of Salmonella’s metabolic activity is also very interesting, and we hope these questions will be answered by future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      (1) P. 9/10 "... the marked delay in shedding after IP and IV relative to orogastric inoculation suggest that the S. Typhimurium population encounters substantial bottleneck(s) on the route(s) from extraintestinal sites back to the intestine.": Can you conclude that from the data? It could also be possible that there is a biological mechanism (other than chance events) that delays the re-entry to the intestine.

      We propose that the delay in shedding indicates additional obstacles that bacteria face when re-entering the intestine, and that there are likely biological mechanisms that cause this delay. However, these unknown mechanisms effectively act as additional bottlenecks by causing a stochastic loss of population diversity. 

      (2) P. 11 "...both organs would likely contain all 10 barcodes. In contrast, a library with 10,000 barcodes can be used to distinguish between a bottleneck resulting in Ns = 1,000 and Ns = 10,000, since these bottlenecks result in a different number of barcodes in output samples. Furthermore, high diversity libraries reduce the likelihood that two tissue samples share the same barcode(s) due to random chance, enabling more accurate quantification of bacterial dissemination.": I agree with the general analysis, but I find it misleading to talk about the presence of barcodes when the analyses in this manuscript are based on the much more powerful comparison of relative abundance of individual tags instead of their presence or absence.

      The reviewer raises an excellent point, and the distinction between relative abundance versus presence/absence is discussed extensively in the original STAMPR manuscript. Although relative abundance is powerful, the primary metric used in this study (Ns) is calculated principally from the number of barcodes, corrected (via simulations) for the probability of observing the same barcode across distinct founders. Although this correction procedure does rely on barcode abundance, the primary driver of founding population quantification is the number of barcodes.

      (3) P.14 "the library in LB supplemented with SM was not significantly different than the parent strain" and Figure 2C: How was significance tested? How many times were the growth curves recorded? On my print-out, the red color has different shades for different growth curves.

      Significance was tested with a Mann-Whitney and growth curves were performed 5 times. Growth curves are displayed with 50% opacity, and as a result multiple curves directly on top of each other appear darker. The legend to S2 has been modified accordingly.

      (4) P.16: close bracket in the equation for FRD calculation.

      Done

      (5) Figure 2C "Average CFU per founder": I found the wording confusing at first as I thought you divided the average bacterial burden per organ by Ns, instead of averaging the CFU/Ns calculated for each mouse.

      The wording has been clarified. 

      (6) Figure 3B: It would be helpful to include expected genetic distances in the schematic as it is difficult to infer the genetic distance when only two of three, respectively, different "barcode colors" are used. While I find the explanation in the main text intuitive, a graphical representation would have helped me.

      Thank you for the suggestion. Unfortunately, using colors to represent barcodes is imperfect and limits the diversity that can be depicted. We have modified Figure 3B to further clarify. 

      (7) Figure 3C: Why do you compare the genetic distance to the liver, when you discuss the genetic distance of the intestinal population? Is it not possible that the intestinal populations are similar to the extraintestinal organs except the liver?

      For clarity, we chose to highlight exclusively the liver. However, we observed a similar pattern to the liver in other extraintestinal organs. To clarify the generalizability of this point we have added a supplemental figure with comparisons to MLN and Spleen (Supplemental figure S4) as well as further text.

      (8) Figure 3C & S5A: I found "+SM" and "+SM, Drinking" confusing and would have preferred "+SM, Gavage" and "+SM, Drinking" for clarity.

      Done, thank you for the suggestion.

      (9) Figure 3G&H: I find it worthy of discussion that the bacterial burden increases over time, while the founding population decreases. Does that not indicate that replication only occurs at specific sites leading to the amplification of only a few barcodes and thereby a larger change of the relative barcode abundance compared to the inoculum?

      From 5h to 120h the size of the founding population decreases in multiple intestinal sites. This likely indicates that the impact of the initial bottleneck is still ongoing at 5h, although further temporal analysis would be required to define the exact timing of the bottleneck. Notably, the passage time through the mouse intestine is ~5h. Many of the founders observed at 5h could be a population that will never establish a replicative niche, and failing to colonize be shed in the feces, bottlenecking the population between 5h and 120h. To clarify this point we have added the following text:

      Section “S. Typhimurium disseminates out of the intestine before establishing an intestinal replicative niche”.

      “In contrast to the liver, there were more founders present in samples from the intestine (particularly in the colon) at 5 hours versus 120 hours (Figure 3H). These data likely indicate that many of the founders observed in the intestine at 5 hours are shed in the feces prior to establishing a replicative niche, and demonstrates that the forces restricting the S. Typhimurium population in the intestine act over a period of > 5 hours.”  

      (10) Figure S2A: I do not understand this figure. Why are there more than 70.000 tags listed? I was under the impression the barcode library in S. Typhimurium had 55.000 tags while only the plasmid pSM1 had more than 70.000 (but the plasmid should not be relevant here). Why are there distinct lines at approximately 10^-5 and a bit lower? I would have expected continuously distributed barcode frequencies.

      During barcode analysis, each library is mapped to the total barcode list in the barcode donor pSM1, which contains ~70,000 barcodes. This enables consistent analysis across different bacterial libraries. The designation “barcode number” refers to the barcode number in pSM1, meaning many of the barcodes in the Salmonella library are at zero reads. This graph type was chosen to show there was no bias toward a particular barcode, however there is significant overlap of the points, making individual barcode frequencies difficult to see. We have changed the x-axis to state “pSM1 Barcode Number” and clarified in the figure legend.

      Since the y-axes on these graphs is on a log10 scale, the lines represent barcodes with 1 read, 2 reads, 3 reads, etc. As the number of reads per barcode increases linearly, the space between them decreases on logarithmic axes.

      (11) There are a few typos in the figure legends of the supplementary material. For example Figure S2: S. Typhimurium not italicized, ~7x105 no superscript. Fig. S4&5 ", Open circles" is "O" is capitalized.

      Typos have been corrected.

    1. eLife Assessment

      The current human tissue-based study provides compelling evidence correlating hippocampal expressions of RNA guanine-rich G-quadruplexes with aging and with Alzheimer's Disease presence and severity. The results are fundamental and will rejuvenate our understanding of aging and AD's pathogenesis.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      This is an interesting manuscript where the authors systematically measure rG4 levels in brain samples at different ages of patients affected by AD. To the best of my knowledge this is the first time that BG4 staining is used in this context and the authors provide compelling evidence to show an association with BG4 staining and age or AD progression, which interestingly indicates that such RNA structure might play a role in regulating protein homeostasis as previously speculated. The methods used and the results reported seems robust and reproducible.

    3. Reviewer #2 (Public review):

      RNA guanine-rich G-quadruplexes (rG4s) are non-canonical higher order nucleic acid structures that can form under physiological conditions. Interestingly, cellular stress is positively correlated with rG4 induction.

      In this study, the authors examined human hippocampal postmortem tissue for the formation ofrG4s in aging and Alzheimer Disease (AD). rG4 immunostaining strongly increased in the hippocampus with both age and with AD severity. 21 cases were used in this study (age range 30-92).

      This immunostaining co-localized with hyper-phosphorylated tau immunostaining in neurons. The BG4 staining levels were also impacted by APOE status. rG4 structure was previously found to drive tau aggregation. Based on these observations, the authors propose a model of neurodegeneration in which chronic rG4 formation drives proteostasis collapse.

      This model is interesting, and would explain different observations (e.g., RNA is present in AD aggregates and rG4s can enhance protein oligomerization and tau aggregation).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This is an interesting manuscript where the authors systematically measure rG4 levels in brain samples at different ages of patients affected by AD. To the best of my knowledge this is the first time that BG4 staining is used in this context and the authors provide compelling evidence to show an association with BG4 staining and age or AD progression, which interestingly indicates that such RNA structure might play a role in regulating protein homeostasis as previously speculated. The methods used and the results reported seems robust and reproducible. There were two main things that needed addressing:

      (1) Usually in BG4 staining experiments to ensure that the signal detected is genuinely due to rG4 an RNase treatment experiment is performed. This does not have to be extended to all the samples presented but having a couple of controls where the authors observe loss of staining upon RNase treatment will be key to ensure with confidence that rG4s are detected under the experimental conditions. This is particularly relevant for this brain tissue samples where BG4 staining has never been performed before.

      (2) The authors have an association between rG4-formation and age/disease progression. They also observe distribution dependency of this, which is great. However, this is still an association which does not allow the model to be supported. This is not something that can be fixed with an easy experiment and it is what it is, but my point is that the narrative of the manuscript should be more fair and reflect the fact that, although interesting, what the authors are observing is a simple correlation. They should still go ahead and propose a model for it, but they should be more balanced in the conclusion and do not imply that this evidence is sufficient to demonstrate the proposed model. It is absolutely fine to refer to the literature and comment on the fact that similar observations have been reported and this is in line with those, but still this is not an ultimate demonstration.

      Comments on current version:

      The authors have now addressed my concerns.

      We thank the reviewer for their support!

      Reviewer #2 (Public review):

      RNA guanine-rich G-quadruplexes (rG4s) are non-canonical higher order nucleic acid structures that can form under physiological conditions. Interestingly, cellular stress is positively correlated with rG4 induction.

      In this study, the authors examined human hippocampal postmortem tissue for the formation ofrG4s in aging and Alzheimer Disease (AD). rG4 immunostaining strongly increased in the hippocampus with both age and with AD severity. 21 cases were used in this study (age range 30-92).

      This immunostaining co-localized with hyper-phosphorylated tau immunostaining in neurons. The BG4 staining levels were also impacted by APOE status. rG4 structure was previously found to drive tau aggregation. Based on these observations, the authors propose a model of neurodegeneration in which chronic rG4 formation drives proteostasis collapse.

      This model is interesting, and would explain different observations (e.g., RNA is present in AD aggregates and rG4s can enhance protein oligomerization and tau aggregation).

      Main issue from the previous round of review:

      There is indeed a positive correlation between Braak stage severity and BG4 staining, but this correlation is relatively weak and borderline significant ((R = 0.52, p value = 0.028). This is probably the main limitation of this study, which should be clearly acknowledged (together with a reminder that "correlation is not causality"). Related to this, here is no clear justification to exclude the four individuals in Fig 1d (without them R increases to 0.78). Please remove this statement. On the other hand, the difference based on APOE status is more striking.

      Comments on current version:

      The authors have made laudable efforts to address the criticisms I made in my evaluation of the original manuscript.

      We thank the reviewer for their support!

      Recommendations for the authors:

      Reviewing Editor:

      I would suggest two minor edits:

      - The findings are correlative and descriptive, but the title implies functionality (A New Role for RNA G-quadruplexes in Aging and Alzheimer′s Disease). I would suggest toning down this title).

      - While I understand the limitations in performing additional biochemical experiments to validate the immunofluorescence study, I think this is worth mentioning as a limitation in the text.

      We have made these two changes as requested, altering the title to remove the word Role that may imply more meaning than intended, and adding a line to the discussion on the need for future additional biochemical experiments.

      Reviewer #1 (Recommendations for the authors):

      Thanks for addressing the concerns raised.

      We thank the reviewer for their support!

      Reviewer #2 (Recommendations for the authors):

      Minor point:

      Related to the "correlation is not causality" remark I made in my evaluation of the original manuscript: the authors' answer is reasonable. Still, I would suggest to modify the abstract: "we propose a model of neurodegeneration in which chronic rG4 formation drives proteostasis collapse" => "we propose a model of neurodegeneration in which chronic rG4 formation is linked to proteostasis collapse"

      All other remarks I made have been answered properly.

      We thank the reviewer for their support! We have made the change exactly as requested by the reviewer.

    1. eLife Assessment

      This important study provides information on the TMEM16 family of membrane proteins, which play roles in lipid scrambling and ion transport. By simulating 27 structures representing five distinct family members, the authors captured hundreds of lipid scrambling events, offering insights into the mechanisms of lipid translocation and the specific protein regions involved in these processes. However, while the data on groove dilation is compelling, the evidence for outside-the-groove scramblase activity without experimental validation is inadequate and is based on a limited set of observed events.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript investigates lipid scrambling mechanisms across TMEM16 family members using coarse-grained molecular dynamics (MD) simulations. While the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations, several critical issues undermine its novelty, impact, and alignment with experimental observations.

      Critical issues:

      (1) Lack of Novelty:<br /> The phenomenon of lipid scrambling via an open hydrophilic groove is already well-established in the literature, including through atomistic MD simulations. The authors themselves acknowledge this fact in their introduction and discussion. By employing coarse-grained simulations, the study essentially reiterates previously known findings with limited additional mechanistic insight. The repeated observation of scrambling occurring predominantly via the groove does not offer significant advancement beyond prior work.

      (2) Redundancy Across Systems:<br /> The manuscript explores multiple TMEM16 family members in activating and non-activating conformations, but the conclusions remain largely confirmatory. The extensive dataset generated through coarse-grained MD simulations primarily reinforces established mechanistic models rather than uncovering fundamentally new insights. The effort, while statistically robust, feels excessive given the incremental nature of the findings.

      (3) Discrepancy with Experimental Observations:<br /> The use of coarse-grained simulations introduces inherent limitations in accurately representing lipid scrambling dynamics at the atomistic level. Experimental studies have highlighted nuances in lipid permeation that are not fully captured by coarse-grained models. This discrepancy raises questions about the biological relevance of the reported scrambling events, especially those occurring outside the canonical groove.

      (4) Alternative Scrambling Sites:<br /> The manuscript reports scrambling events at the dimer-dimer interface as a novel mechanism. While this observation is intriguing, it is not explored in sufficient detail to establish its functional significance. Furthermore, the low frequency of these events (relative to groove-mediated scrambling) suggests they may be artifacts of the simulation model rather than biologically meaningful pathways.

      Conclusion:

      Overall, while the study is technically sound and presents a large dataset of lipid scrambling events across multiple TMEM16 structures, it falls short in terms of novelty and mechanistic advancement. The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

    3. Reviewer #2 (Public review):

      Summary:

      Stephens et al. present a comprehensive study of TMEM16-members via coarse-grained MD simulations (CGMD). They particularly focus on the scramblase ability of these proteins and aim to characterize the "energetics of scrambling". Through their simulations, the authors interestingly relate protein conformational states to the membrane's thickness and link those to the scrambling ability of TMEM members, measured as the trespassing tendency of lipids across leaflets. They validate their simulation with a direct qualitative comparison with Cryo-EM maps.

      Strengths:

      The study demonstrates an efficient use of CGMD simulations to explore lipid scrambling across various TMEM16 family members. By leveraging this approach, the authors are able to bypass some of the sampling limitations inherent in all-atom simulations, providing a more comprehensive and high-throughput analysis of lipid scrambling. Their comparison of different protein conformations, including open and closed groove states, presents a detailed exploration of how structural features influence scrambling activity, adding significant value to the field. A key contribution of this study is the finding that groove dilation plays a central role in lipid scrambling. The authors observe that for scrambling-competent TMEM16 structures, there is substantial membrane thinning and groove widening. The open Ca2+-bound nhTMEM16 structure (PDB ID 4WIS) was identified as the fastest scrambler in their simulations, with scrambling rates as high as 24.4 {plus minus} 5.2 events per μs. This structure also shows significant membrane thinning (up to 18 Å), which supports the hypothesis that groove dilation lowers the energetic barrier for lipid translocation, facilitating scrambling.

      The study also establishes a correlation between structural features and scrambling competence, though analyses often lack statistical robustness and quantitative comparisons. The simulations differentiate between open and closed conformations of TMEM16 structures, with open-groove structures exhibiting increased scrambling activity, while closed-groove structures do not. This finding aligns with previous research suggesting that the structural dynamics of the groove are critical for scrambling. Furthermore, the authors explore how the physical dimensions of the groove qualitatively correlate with observed scrambling rates. For example, TMEM16K induces increased membrane thinning in its open form, suggesting that membrane properties, along with structural features, play a role in modulating scrambling activity.

      Another significant finding is the concept of "out-of-the-groove" scrambling, where lipid translocation occurs outside the protein's groove. This observation introduces the possibility of alternate scrambling mechanisms that do not follow the traditional "credit-card model" of groove-mediated lipid scrambling. In their simulations, the authors note that these out-of-the-groove events predominantly occur at the dimer interface between TM3 and TM10, especially in mammalian TMEM16 structures. While these events were not observed in fungal TMEM16s, they may provide insight into Ca2+-independent scrambling mechanisms, as they do not require groove opening.

      Weaknesses:

      A significant challenge of the study is the discrepancy between the scrambling rates observed in CGMD simulations and those reported experimentally. Despite the authors' claim that the rates are in line experimentally, the observed differences can mean large energetic discrepancies in describing scrambling (larger than 1kT barrier in reality). For instance, the authors report scrambling rates of 10.7 events per μs for TMEM16F and 24.4 events per μs for nhTMEM16, which are several orders of magnitude faster than experimental rates. While the authors suggest that this discrepancy could be due to the Martini 3 force field's faster diffusion dynamics, this explanation does not fully account for the large difference in rates. A more thorough discussion on how the choice of force field and simulation parameters influence the results, and how these discrepancies can be reconciled with experimental data, would strengthen the conclusions. Likewise, rate calculations in the study are based on 10 μs simulations, while experimental scrambling rates occur over seconds. This timescale discrepancy limits the study's accuracy, as the simulations may not capture rare or slow scrambling events that are observed experimentally and therefore might underestimate the kinetics of scrambling. It's however important to recognize that it's hard (borderline unachievable) to pinpoint reasonable kinetics for systems like this using the currently available computational power and force field accuracy. The faster diffusion in simulations may lead to overestimated scrambling rates, making the simulation results less comparable to real-world observations. Thus, I would therefore read the findings qualitatively rather than quantitatively. An interesting observation is the asymmetry observed in the scrambling rates of the two monomers. Since MARTINI is known to be limited in correctly sampling protein dynamics, the authors - in order to preserve the fold - have applied a strong (500 kJ mol-1 nm-2) elastic network. However, I am wondering how the ENM applies across the dimer and if any asymmetry can be noticed in the application of restraints for each monomer and at the dimer interface. How can this have potentially biased the asymmetry in the scrambling rates observed between the monomers? Is this artificially obtained from restraining the initial structure, or is the asymmetry somehow gatekeeping the scrambling mechanism to occur majorly across a single monomer? Answering this question would have far-reaching implications to better describe the mechanism of scrambling.

      Notably, the manuscript does not explore the impact of membrane composition on scrambling rates. While the authors use a specific lipid composition (DOPC) in their simulations, they acknowledge that membrane composition can influence scrambling activity. However, the study does not explore how different lipids or membrane environments or varying membrane curvature and tension, could alter scrambling behaviour. I appreciate that this might have been beyond the scope of this particular paper and the authors plan to further chase these questions, as this work sets a strong protocol for this study. Contextualizing scrambling in the context of membrane composition is particularly relevant since the authors note that TMEM16K's scrambling rate increases tenfold in thinner membranes, suggesting that lipid-specific or membrane-thickness-dependent effects could play a role.

    4. Reviewer #3 (Public review):

      Summary:

      The paper investigates the TMEM16 family of membrane proteins, which play roles in lipid scrambling and ion transport. A total of 27 experimental structures from five TMEM16 family members were analyzed, including mammalian and fungal homologs (e.g., TMEM16A, TMEM16F, TMEM16K, nhTMEM16, afTMEM16). The identified structures were in both Ca²⁺-bound (open) and Ca²⁺-free (closed) states to compare conformations and were preprocessed (e.g., modeling missing loops) and equilibrated. Coarse-grain simulations were performed in DOPC membranes for 10 microseconds to capture the scrambling events. These events were identified by tracking lipids transitioning between the two membrane leaflets and they analysed the correlation between scrambling rates, in addition, structural properties such as groove dilation and membrane thinning were calculated. They report 700 scrambling events across structures and Figure 2 elaborates on how open structures show higher activity, also as expected. The authors also address how structures may require open grooves, this and other mechanisms around scrambling are a bit controversial in the field.

      Strengths:

      The strength of this study emerges from a comparative analysis of multiple structural starting points and understanding global/local motions of the protein with respect to lipid movement. Although the protein is well-studied, both experimentally and computationally, the understanding of conformational events in different family members, especially membrane thickness less compared to fungal scramblases offers good insights.

      Weaknesses:

      The weakness of the work is to fully reconcile with experimental evidence of Ca²⁺-independent scrambling rates observed in prior studies, but this part is also challenging using coarse-grain molecular simulations. Previous reports have identified lipid crossing, packing defects, and other associated events, so it is difficult to place this paper in that context. However, the absence of validation leaves certain claims, like alternative scrambling pathways, speculative.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript investigates lipid scrambling mechanisms across TMEM16 family members using coarse-grained molecular dynamics (MD) simulations. While the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations, several critical issues undermine its novelty, impact, and alignment with experimental observations.

      Critical issues:

      (1) Lack of Novelty:

      The phenomenon of lipid scrambling via an open hydrophilic groove is already well-established in the literature, including through atomistic MD simulations. The authors themselves acknowledge this fact in their introduction and discussion. By employing coarse-grained simulations, the study essentially reiterates previously known findings with limited additional mechanistic insight. The repeated observation of scrambling occurring predominantly via the groove does not offer significant advancement beyond prior work.

      We agree with the reviewer’s statement regarding the lack of novelty when it comes to our observations of scrambling in the groove of open Ca<sup>2+</sup>-bound TMEM16 structures. However, we feel that the inclusion of closed structures in this study, which attempts to address the yet unanswered question of how scrambling by TMEM16s occurs in the absence of Ca<sup>2+</sup>, offers new observations for the field. In our study we specifically address to what extent the induced membrane deformation, which has been theorized to aid lipids cross the bilayer especially in the absence of Ca<sup>2+</sup>, contributes to the rate of scrambling (see references 36, 59, and 66). There are also several TMEM16F structures solved under activating conditions (bound to Ca<sup>2+</sup> and in the presence of PIP2) which feature structural rearrangements to TM6 that may be indicative of an open state (PDB 6P48) and had not been tested in simulations. We show that these structures do not scramble and thereby present evidence against an out-of-the-groove scrambling mechanism for these states. Although we find a handful of examples of lipids being scrambled by Ca<sup>2+</sup>-free structures of TMEM16 scramblases, none of our simulations suggest that these events are related to the degree of deformation.

      (2) Redundancy Across Systems:

      The manuscript explores multiple TMEM16 family members in activating and non-activating conformations, but the conclusions remain largely confirmatory. The extensive dataset generated through coarse-grained MD simulations primarily reinforces established mechanistic models rather than uncovering fundamentally new insights. The effort, while statistically robust, feels excessive given the incremental nature of the findings.

      Again, we agree with the reviewer’s statement that our results largely confirm those published by other groups and our own. We think there is however value in comparing the scrambling competence of these TMEM16 structures in a consistent manner in a single study to reduce inconsistencies that may be introduced by different simulation methods, parameters, environmental variables such as lipid composition as used in other published works of single family members. The consistency across our simulations and high number of observed scrambling events have allowed us to confirm that the mechanism of scrambling is shared by multiple family members and relies most obviously on groove dilation.

      (3) Discrepancy with Experimental Observations:

      The use of coarse-grained simulations introduces inherent limitations in accurately representing lipid scrambling dynamics at the atomistic level. Experimental studies have highlighted nuances in lipid permeation that are not fully captured by coarse-grained models. This discrepancy raises questions about the biological relevance of the reported scrambling events, especially those occurring outside the canonical groove.

      We thank the reviewer for bringing up the possible inaccuracies introduced by coarse graining our simulations. This is also a concern for us, and we address this issue extensively in our discussion. As the reviewer pointed out above, our CG simulations have largely confirmed existing evidence in the field which we think speaks well to the transferability of observations from atomistic simulations to the coarse-grained level of detail. We have made both qualitative and quantitative comparisons between atomistic and coarse-grained simulations of nhTMEM16 and TMEM16F (Figure 1, Figure 4-figure supplement 1, Figure 4-figure supplement 5) showing the two methods give similar answers for where lipids interact with the protein, including outside of the canonical groove. We do not dispute the possible discrepancy between our simulations and experiment, but our goal is to share new nuanced ideas for the predicted TMEM16 scrambling mechanism that we hope will be tested by future experimental studies.

      (4) Alternative Scrambling Sites:

      The manuscript reports scrambling events at the dimer-dimer interface as a novel mechanism. While this observation is intriguing, it is not explored in sufficient detail to establish its functional significance. Furthermore, the low frequency of these events (relative to groove-mediated scrambling) suggests they may be artifacts of the simulation model rather than biologically meaningful pathways.

      We agree with the reviewer that our observed number of scrambling events in the dimer interface is too low to present it as strong evidence for it being the alternative mechanism for Ca<sup>2+</sup>-independent scrambling. This will require additional experiments and computational studies which we plan to do in future research. However, we are less certain that these are artifacts of the coarse-grained simulation system as we observed a similar event in an atomistic simulation of TMEM16F.

      Conclusion:

      Overall, while the study is technically sound and presents a large dataset of lipid scrambling events across multiple TMEM16 structures, it falls short in terms of novelty and mechanistic advancement. The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      Reviewer #2 (Public review):

      Summary:

      Stephens et al. present a comprehensive study of TMEM16-members via coarse-grained MD simulations (CGMD). They particularly focus on the scramblase ability of these proteins and aim to characterize the "energetics of scrambling". Through their simulations, the authors interestingly relate protein conformational states to the membrane's thickness and link those to the scrambling ability of TMEM members, measured as the trespassing tendency of lipids across leaflets. They validate their simulation with a direct qualitative comparison with Cryo-EM maps.

      Strengths:

      The study demonstrates an efficient use of CGMD simulations to explore lipid scrambling across various TMEM16 family members. By leveraging this approach, the authors are able to bypass some of the sampling limitations inherent in all-atom simulations, providing a more comprehensive and high-throughput analysis of lipid scrambling. Their comparison of different protein conformations, including open and closed groove states, presents a detailed exploration of how structural features influence scrambling activity, adding significant value to the field. A key contribution of this study is the finding that groove dilation plays a central role in lipid scrambling. The authors observe that for scrambling-competent TMEM16 structures, there is substantial membrane thinning and groove widening. The open Ca<sup>2+</sup>-bound nhTMEM16 structure (PDB ID 4WIS) was identified as the fastest scrambler in their simulations, with scrambling rates as high as 24.4 {plus minus} 5.2 events per μs. This structure also shows significant membrane thinning (up to 18 Å), which supports the hypothesis that groove dilation lowers the energetic barrier for lipid translocation, facilitating scrambling.

      The study also establishes a correlation between structural features and scrambling competence, though analyses often lack statistical robustness and quantitative comparisons. The simulations differentiate between open and closed conformations of TMEM16 structures, with open-groove structures exhibiting increased scrambling activity, while closed-groove structures do not. This finding aligns with previous research suggesting that the structural dynamics of the groove are critical for scrambling. Furthermore, the authors explore how the physical dimensions of the groove qualitatively correlate with observed scrambling rates. For example, TMEM16K induces increased membrane thinning in its open form, suggesting that membrane properties, along with structural features, play a role in modulating scrambling activity.

      Another significant finding is the concept of "out-of-the-groove" scrambling, where lipid translocation occurs outside the protein's groove. This observation introduces the possibility of alternate scrambling mechanisms that do not follow the traditional "credit-card model" of groove-mediated lipid scrambling. In their simulations, the authors note that these out-of-the-groove events predominantly occur at the dimer interface between TM3 and TM10, especially in mammalian TMEM16 structures. While these events were not observed in fungal TMEM16s, they may provide insight into Ca<sup>2+</sup>-independent scrambling mechanisms, as they do not require groove opening.

      Weaknesses:

      A significant challenge of the study is the discrepancy between the scrambling rates observed in CGMD simulations and those reported experimentally. Despite the authors' claim that the rates are in line experimentally, the observed differences can mean large energetic discrepancies in describing scrambling (larger than 1kT barrier in reality). For instance, the authors report scrambling rates of 10.7 events per μs for TMEM16F and 24.4 events per μs for nhTMEM16, which are several orders of magnitude faster than experimental rates. While the authors suggest that this discrepancy could be due to the Martini 3 force field's faster diffusion dynamics, this explanation does not fully account for the large difference in rates. A more thorough discussion on how the choice of force field and simulation parameters influence the results, and how these discrepancies can be reconciled with experimental data, would strengthen the conclusions. Likewise, rate calculations in the study are based on 10 μs simulations, while experimental scrambling rates occur over seconds. This timescale discrepancy limits the study's accuracy, as the simulations may not capture rare or slow scrambling events that are observed experimentally and therefore might underestimate the kinetics of scrambling. It's however important to recognize that it's hard (borderline unachievable) to pinpoint reasonable kinetics for systems like this using the currently available computational power and force field accuracy. The faster diffusion in simulations may lead to overestimated scrambling rates, making the simulation results less comparable to real-world observations. Thus, I would therefore read the findings qualitatively rather than quantitatively. An interesting observation is the asymmetry observed in the scrambling rates of the two monomers. Since MARTINI is known to be limited in correctly sampling protein dynamics, the authors - in order to preserve the fold - have applied a strong (500 kJ mol-1 nm-2) elastic network. However, I am wondering how the ENM applies across the dimer and if any asymmetry can be noticed in the application of restraints for each monomer and at the dimer interface. How can this have potentially biased the asymmetry in the scrambling rates observed between the monomers? Is this artificially obtained from restraining the initial structure, or is the asymmetry somehow gatekeeping the scrambling mechanism to occur majorly across a single monomer? Answering this question would have far-reaching implications to better describe the mechanism of scrambling.

      The main aim of our computational survey was to directly compare all relevant published TMEM16 structures in both open and closed states using the Martini 3 CGMD force field. Our standardized simulation and analysis protocol allowed us to quantitatively compare scrambling rates across the TMEM16 family, something that has never been done before. We do acknowledge that direct comparison between simulated versus experimental scrambling rates is complicated and is best to be interpreted qualitatively. In line with other reports (e.g., Li et al, PNAS 2024), lipid scrambling in CGMD is 2-3 orders of magnitude faster than typical experimental findings. In the CG simulation field, these increased dynamics due to the smoother energy landscape are a well known phenomenon. In our view, this is a valuable trade-off for being able to capture statistically robust scrambling dynamics and gain mechanistic understanding in the first place, since these are currently challenging to obtain otherwise. For example, with all-atom MD it would have been near-impossible to conclude that groove openness and high scrambling rates are closely related, simply because one would only measure a handful of scrambling events in (at most) a handful of structures.

      Considering the elastic network: the reviewer is correct in that the elastic network restrains the overall structure to the experimental conformation. This is necessary because the Martini 3 force field does not accurately model changes in secondary (and tertiary) structure. In fact, by retaining the structural information from the experimental structures, we argue that the elastic network helped us arrive at the conclusion that groove openness is the major contributing factor in determining a protein’s scrambling rate. This is best exemplified by the asymmetric X-ray structure of TMEM16K (5OC9), in which the groove of one subunit is more dilated than the other. In our simulation, this information was stored in the elastic network, yielding a 4x higher rate in the open groove than in the closed groove, within the same trajectory.

      Notably, the manuscript does not explore the impact of membrane composition on scrambling rates. While the authors use a specific lipid composition (DOPC) in their simulations, they acknowledge that membrane composition can influence scrambling activity. However, the study does not explore how different lipids or membrane environments or varying membrane curvature and tension, could alter scrambling behaviour. I appreciate that this might have been beyond the scope of this particular paper and the authors plan to further chase these questions, as this work sets a strong protocol for this study. Contextualizing scrambling in the context of membrane composition is particularly relevant since the authors note that TMEM16K's scrambling rate increases tenfold in thinner membranes, suggesting that lipid-specific or membrane-thickness-dependent effects could play a role.

      Considering different membrane compositions: for this study, we chose to keep the membranes as simple as possible. We opted for pure DOPC membranes, because it has (1) negligible intrinsic curvature, (2) forms fluid membranes, and (3) was used previously by others (Li et al, PNAS 2024). As mentioned by the reviewer, we believe our current study defines a good standardized protocol and solid baseline for future efforts looking into the additional effects of membrane composition, tension, and curvature that could all affect TMEM16-mediated lipid scrambling.

      Reviewer #3 (Public review):

      Strengths:

      The strength of this study emerges from a comparative analysis of multiple structural starting points and understanding global/local motions of the protein with respect to lipid movement. Although the protein is well-studied, both experimentally and computationally, the understanding of conformational events in different family members, especially membrane thickness less compared to fungal scramblases offers good insights.

      We appreciate the reviewer recognizing the value of the comparative study. In addition to valuable insights from previous experimental and computational work, we hope to put forward a unifying framework that highlights various TMEM16 structural features and membrane properties that underlie scrambling function.

      Weaknesses:

      The weakness of the work is to fully reconcile with experimental evidence of Ca²⁺-independent scrambling rates observed in prior studies, but this part is also challenging using coarse-grain molecular simulations. Previous reports have identified lipid crossing, packing defects, and other associated events, so it is difficult to place this paper in that context. However, the absence of validation leaves certain claims, like alternative scrambling pathways, speculative.

      It is generally difficult to quantitatively compare bulk measurements of scrambling phenomena with simulation results. The advantage of simulations is to directly observe the transient scrambling events at a spatial and temporal resolution that is currently unattainable for experiments. The current experimental evidence for the precise mechanism of Ca<sup>2+</sup>-independent scrambling is still under debate. We therefore hope to leverage the strength of MD and statistical rigor of coarse-grained simulations to generate testable hypotheses for further structural, biochemical, and computational studies.

    1. eLife Assessment

      This study presents valuable data on the increase in individual differences in functional connectivity with the auditory cortex in individuals with congenital/early-onset hearing loss compared to individuals with normal hearing. The evidence supporting the study's claims is convincing, although additional work using resting-state functional connectivity in the future could further strengthen the results. The work will be of interest to neuroscientists working on brain plasticity and may have implications for the design of interventions and compensatory strategies.

    2. Reviewer #1 (Public review):

      This experiment sought to determine what effect congenital/early-onset hearing loss (and associated delay in language onset) has on the degree of inter-individual variability in functional connectivity to the auditory cortex. Looking at differences in variability rather than group differences in mean connectivity itself represents an interesting addition to the existing literature. The sample of deaf individuals was large, and quite homogeneous in terms of age of hearing loss onset, which are considerable strengths of the work. The experiment appears well conducted and the results are certainly of interest.

      Comment from Reviewing Editor: In the revised manuscript, the authors have addressed all concerns previously identified by reviewer 1.

    3. Reviewer #3 (Public review):

      Summary:

      This study focuses on changes in brain organization associated with congenital deafness. The authors investigate differences in functional connectivity (FC) and differences in the variability of FC. By comparing congenitally deaf individuals to individuals with normal hearing, and by further separating congenitally deaf individuals into groups of early and late signers, the authors can distinguish between changes in FC due to auditory deprivation and changes in FC due to late language acquisition. They find larger FC variability in deaf than normal-hearing individuals in temporal, frontal, parietal, and midline brain structures, and that FC variability is largely driven by auditory deprivation. They suggest that the regions that show a greater FC difference between groups also show greater FC variability.

      Strengths:

      The manuscript is well-written, and the methods are clearly described and appropriate. Including the three different groups enables the critical contrasts distinguishing between different causes of FC variability changes. The results are interesting and novel.

      Weaknesses:

      Analyses were conducted for task-based data rather than resting-state data. The authors report behavioral differences between groups and include behavioral performance as a nuisance regressor in their analysis. This is a good approach to account for behavioral task differences, given the data. Nevertheless, additional work using resting-state functional connectivity could remove the potential confound fully.

      Comment from Reviewing Editor: In the revised manuscript, the authors have addressed all concerns previously identified by reviewer 3, and the eLife assessment statement reflects the point by reviewer 3 that using resting-state functional connectivity in the future could further strengthen the results.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public review):

      This experiment sought to determine what effect congenital/early-onset hearing loss (and associated delay in language onset) has on the degree of inter-individual variability in functional connectivity to the auditory cortex. Looking at differences in variability rather than group differences in mean connectivity itself represents an interesting addition to the existing literature. The sample of deaf individuals was large, and quite homogeneous in terms of age of hearing loss onset, which are considerable strengths of the work. The experiment appears well conducted and the results are certainly of interest. R: Thank you for your positive and thoughtful feedback.

      Reviewer #3 (Public review):

      Summary:

      This study focuses on changes in brain organization associated with congenital deafness. The authors investigate differences in functional connectivity (FC) and differences in the variability of FC. By comparing congenitally deaf individuals to individuals with normal hearing, and by further separating congenitally deaf individuals into groups of early and late signers, the authors can distinguish between changes in FC due to auditory deprivation and changes in FC due to late language acquisition. They find larger FC variability in deaf than normal-hearing individuals in temporal, frontal, parietal, and midline brain structures, and that FC variability is largely driven by auditory deprivation. They suggest that the regions that show a greater FC difference between groups also show greater FC variability.

      Strengths:

      The manuscript is well-written, and the methods are clearly described and appropriate. Including the three different groups enables the critical contrasts distinguishing between different causes of FC variability changes. The results are interesting and novel.

      Weaknesses:

      Analyses were conducted for task-based data rather than resting-state data. The authors report behavioral differences between groups and include behavioral performance as a nuisance regressor in their analysis. This is a good approach to account for behavioral task differences, given the data. Nevertheless, additional work using resting-state functional connectivity could remove the potential confound fully.

      The authors have addressed my concerns well.

      Thank you for your thoughtful feedback. We appreciate your acknowledgment of the strengths of our study and the approaches taken to address potential confounds. As noted, we discuss the limitation of not including resting-state data in the manuscript, and we agree that this represents an important avenue for future research. We hope to address this question in future studies.

    1. eLife Assessment

      This fundamental study provides a critical challenge to a great many studies of the neural correlates of consciousness that were based on post hoc sorting of reported awareness experience. The evidence supporting this criticism is compelling, based on simulations and decoding analysis of EEG data. The results will be of interest not only to psychologists and neuroscientists but also to philosophers who work on addressing mind-body relationships.

    2. Reviewer #1 (Public review):

      The study aimed to investigate the significant impact of criterion placement on the validity of neural measures of consciousness, examining how different standards for classifying a stimulus as 'seen' or 'unseen' can influence the interpretation of neural data. They conducted simulations and EEG experiments to demonstrate that the Perceptual Awareness Scale, a widely used tool in consciousness research, may not effectively mitigate criterion-related confounds, suggesting that even with the PAS, neural measures can be compromised by how criteria are set. Their study challenged existing paradigms by showing that the construct validity of neural measures of conscious and unconscious processing is threatened by criterion placement, and they provided practical recommendations for improving experimental designs in the field. The authors' work contributes to a deeper understanding of the nature of conscious and unconscious processing and addresses methodological concerns by exploring the pervasive influence of criterion placement on neural measures of consciousness and discussing alternative paradigms that might offer solutions to the criterion problem.

      The study effectively demonstrates that the placement of criteria for determining whether a stimulus is 'seen' or 'unseen' significantly impacts the validity of neural measures of consciousness. The authors found that conservative criteria tend to inflate effect sizes, while liberal criteria reduce them, leading to potentially misleading conclusions about conscious and unconscious processing. The authors employed robust simulations and EEG experiments to demonstrate the effects of criterion placement, ensuring that the findings are well-supported by empirical evidence. The results from both experiments confirm the predicted confounding effects of criterion placement on neural measures of unconscious and conscious processing.

      The results are consistent with their hypotheses and contribute meaningfully to the field of consciousness research.

    3. Reviewer #2 (Public review):

      Summary:

      The study investigates the potential influence of the response criterion on neural decoding accuracy in consciousness and unconsciousness, utilizing either simulated data or reanalyzing experimental data with post-hoc sorting data.

      Strengths:

      When comparing the neural decoding performance of Target versus NonTarget with or without post-hoc sorting based on subject reports, it is evident that response criterion can influence the results. This was observed in simulated data as well as in two experiments that manipulated subject response criterion to be either more liberal or more conservative. One experiment involved a two-level response (seen vs unseen), while the other included a more detailed four-level response (ranging from 0 for no experience to 3 for a clear experience). The findings consistently indicated that adopting a more conservative response criterion could enhance neural decoding performance, whether in conscious or unconscious states, depending on the sensitivity or overall response threshold.

      Weaknesses:

      (1) In the realm of research methodology, conducting post-hoc sorting based on subject reports raises an issue. This operation leads to an imbalance in the number of trials between the two conditions (Target and NonTarget) during the decoding process. Such trial number disparity introduces bias during decoding, likely contributing to fluctuations in neural decoding performance. This potential confounding factor significantly impacts the interpretation of research findings. The trial number imbalance may cause models to exhibit a bias towards the category with more trials during the learning process, leading to misjudgments of neural signal differences between the two conditions and failing to accurately reflect the distinctions in brain neural activity between target and non-target states. Therefore, it is recommended that the authors extensively discuss this confounding factor in their paper. They should analyze in detail how this factor could influence the interpretation of results, such as potentially exaggerating or diminishing certain effects, and whether measures are necessary to correct the bias induced by this imbalance to ensure the reliability and validity of the research conclusions.

    4. Reviewer #3 (Public review):

      Summary:

      Fahrenfort et al. investigate how liberal or conservative criterion placement in a detection task affects the construct validity of neural measures of unconscious cognition and conscious processing. Participants identified instances of "seen" or "unseen" in a detection task, a method known as post hoc sorting. Simulation data convincingly demonstrate that, counterintuitively, a conservative criterion inflates effect sizes of neural measures compared to a liberal criterion. While the impact of criterion shifts on effect size is suggested by signal detection theory, this study is the first to address this explicitly within the consciousness literature. Decoding analysis of data from two EEG experiments further shows that different criteria lead to differential effects on classifier performance in post hoc sorting. The findings underscore the pervasive influence of experimental design and participant reports on neural measures of consciousness, revealing that criterion placement poses a critical challenge for researchers.

      Strengths and Weaknesses

      One of the strengths of this study is the inclusion of the Perceptual Awareness Scale (PAS), which allows participants to provide more nuanced responses regarding their perceptual experiences. This approach ensures that responses at the lowest awareness level (selection 0) are made only when trials are genuinely unseen. This methodological choice is important as it helps prevent the overestimation of unconscious processing, enhancing the validity of the findings.<br /> The authors also do a commendable job in the discussion by addressing alternative paradigms, such as wagering paradigms, as a possible remedy to the criterion problem (Peters & Lau, 2015; Dienes & Seth, 2010). Their consideration of these alternatives provides a balanced view and strengthens the overall discussion.

      Our initial review identified a lack of measures of variance as one potential weakness of this work. However we agree with the authors' response that plotting individual datapoints for each condition is indeed a good visualization of variance within a dataset.

      Impact of the Work:

      This study effectively demonstrates a phenomenon that, while understood within the context of signal detection theory, has been largely unexplored within the consciousness literature. Subjective measures may not reliably capture the construct they aim to measure due to criterion confounds. Future research on neural measures of consciousness should account for this issue, and no-report measures may be necessary until the criterion problem is resolved.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The paper proposes that the placement of criteria for determining whether a stimulus is 'seen' or 'unseen' can significantly impact the validity of neural measures of consciousness. The authors found that conservative criteria, which require stronger evidence to classify a stimulus as 'seen,' tend to inflate effect sizes in neural measures, making conscious processing appear more pronounced than it is. Conversely, liberal criteria, which require less evidence, reduce these effect sizes, potentially underestimating conscious processing. This variability in effect sizes due to criterion placement can lead to misleading conclusions about the nature of conscious and unconscious processing.

      Furthermore, the study highlights that the Perceptual Awareness Scale (PAS), a commonly used tool in consciousness research, does not effectively mitigate these criterion-related confounds. This means that even with PAS, the validity of neural measures can still be compromised by how criteria are set. The authors emphasize the need for careful consideration and standardization of criterion placement in experimental designs to ensure that neural measures accurately reflect the underlying cognitive processes. By addressing this issue, the paper aims to improve the reliability and validity of findings in the field of consciousness research.

      Strengths:

      (1) This research provides a fresh perspective on how criterion placement can significantly impact the validity of neural measures in consciousness research.

      (2) The study employs robust simulations and EEG experiments to demonstrate the effects of criterion placement, ensuring that the findings are well-supported by empirical evidence.

      (3) By highlighting the limitations of the PAS and the impact of criterion placement, the study offers practical recommendations for improving experimental designs in consciousness research.

      Weaknesses:

      The primary focused criterion of PAS is a commonly used tool, but there are other measures of consciousness that were not evaluated, which might also be subject to similar or different criterion limitations. A simulation could applied to these metrics to show how generalizable the conclusion of the study is.

      We would like to thank reviewer 1 for their positive words and for taking the time to evaluate our manuscript. We agree that it would be important to gauge generalization to other metrics of consciousness. Note however, that the most commonly used alternative methods are postdecision wagering and confidence, both of which are known to behave quite similarly to the PAS (Sandberg, Timmermans , Overgaard & Cleeremans, 2010). Indeed, we have confirmed in other work that confidence is also sensitive to criterion shifts (see https://osf.io/preprints/psyarxiv/xa4fj). Although it has been claimed that confidence-derived aggregate metrics like meta-d’ or metacognitive efficiency may overcome criterion shifts, it would require empirical data rather than simulation to settle whether this is true or not (also see the discussion in https://osf.io/preprints/psyarxiv/xa4fj). Furthermore, out of these metrics, the PAS seems to be the preferred one amongst consciouness researchers (see figure 4 in Francken, Beerendonk, Molenaar, Fahrenfort, Kiverstein, Seth, Gaal S van, 2022; as well as https://osf.io/preprints/psyarxiv/bkxzh). Thus, given the fact that other metrics are either expected to behave in similar ways and/or because it would require more empirical work to determine along which dimension(s) criterion shifts would operate in alternative metrics, we see no clear path to implement the suggested simulations. We anticipate that aiming to do this would require a considerable amount of additional work, figuring out many things which we believe would better suit a future project. We would of course be open to doing this if the reviewer would have more specific suggestions for how to go about the proposed simulations.

      Reviewer #2 (Public review):

      Summary:

      The study investigates the potential influence of the response criterion on neural decoding accuracy in consciousness and unconsciousness, utilizing either simulated data or reanalyzing experimental data with post-hoc sorting data.

      Strengths:

      When comparing the neural decoding performance of Target versus NonTarget with or without post-hoc sorting based on subject reports, it is evident that response criterion can influence the results. This was observed in simulated data as well as in two experiments that manipulated the subject response criterion to be either more liberal or more conservative. One experiment involved a two-level response (seen vs unseen), while the other included a more detailed four-level response (ranging from 0 for no experience to 3 for a clear experience). The findings consistently indicated that adopting a more conservative response criterion could enhance neural decoding performance, whether in conscious or unconscious states, depending on the sensitivity or overall response threshold.

      Weaknesses:

      (1) The response criterion plays a crucial role in influencing neural decoding because a subject's report may not always align with the actual stimulus presented. This discrepancy can occur in cases of false alarms, where a subject reports seeing a target that was not actually there, or in cases where a target is present but not reported. Some may argue that only using data from consistent trials (those with correct responses) would not be affected by the response criterion. However, the authors' analysis suggests that a conservative response criterion not only reduces false alarms but also impacts hit rates. It is important for the authors to further investigate how the response criterion affects neural decoding even when considering only correct trials.

      We would like to thank reviewer 2 for taking the time to evaluate our manuscript. We appreciate the suggestion to investigate neural decoding on only correct trials. What we in fact did is consider target trials that are 'correct' (hits = seen target present trials) and 'incorrect' (misses = unseen target present trials) separately, see figure 4A and figure 4B. This shows that the response criterion also affects the neural measure of consciousness when only considering correct target present trials. Note however, that one cannot decode 'unseen' (target present) trials if one only aims to decode 'correct' trials, because those are all incorrect by definition. We did not analyze false alarms (these would be the 'seen' trials on the noise distribution of Figure 1A), as there were not enough trials of those, especially in the conservative condition (see Figure 2C and 2D), making comparisons between conservative and liberal impossible. However, the predictions for false alarms are pretty straightforward, and follow directly from the framework in Figure 1.

      (2) The author has utilized decoding target vs. nontarget as the neural measures of unconscious and/or conscious processing. However, it is important to note that this is just one of the many neural measures used in the field. There are an increasing number of studies that focus on decoding the conscious content, such as target location or target category. If the author were to include results on decoding target orientation and how it may be influenced by response criterion, the field would greatly benefit from this paper.

      We thank the reviewer for the suggestion to decode orientation of the target. In our experiments, the target itself does not have an orientation, but the texture of which it is composed does. We used four orientations, which were balanced out within and across conditions such that presence-absence decoding is never driven by orientation, but rather by texture based figure-ground segregation (for similar logic, see for example Fahrenfort et al, 2007; 2008 etc). There are a couple of things to consider when wanting to apply a decoding analysis on the orientation of these textures:

      (1) Our behavioral task was only on the presence or absence of the target, not on the orientation of the textures. This makes it impossible to draw any conclusions about the visibility of the orientation of the textures. Put differently: based on behavior there is no way of identifying seen or unseen orientations, correctly or incorrectly identified orientations etc. For examply, it is easy to envision that an observer detects a target without knowing the orientation that defines it, or vice versa a situation in which an observer does not detect the target while still being aware of the orientation of a texture in the image (either of the figure, or of the background). The fact that we have no behavioral response to the orientation of the textures severely limits the usefulness of a hypothetical decoding effect on these orientations, as such results would be uninterpretable with respect to the relevant dimension in this experiment, which is visibility.

      (2) This problem is further excarbated by the fact that the orientation of the background is always orthogonal to the orientation of the target. Therefore, one would not only be decoding the orientation of the texture that constitutes the target itself, but also the texture that constitutes the background. Given that we also have no behavioral metric of how/whether the orientation of the background is perceived, it is similarly unclear how one would interpret any observed effect.

      (3) Finally, it is important to note that – even when categorization/content is sometimes used as an auxiliary measure in consciousness research (often as a way to assay objective performance) - consciousness is most commonly conceptualized on the presence-absence dimension. A clear illustration of this is the concept of blindsight. Blindsight is the ability of observers to discriminate stimuli (i.e. identify content) without being able to detect them. Blindsight is often considered the bedrock of the cognitive neuroscience of consciousness as it acts as proof that one can dissociate between unconscious processing (the categorization of a stimulus, i.e. the content) and conscious processing of that stimulus (i.e. the ability to detect it).

      Given the above, we do not see how the suggested analysis could contribute to the conclusions that the manuscript already establishes. We hope that – given the above - the reviewer agrees with this assessment.

      Reviewer #3 (Public review):

      Summary:

      Fahrenfort et al. investigate how liberal or conservative criterion placement in a detection task affects the construct validity of neural measures of unconscious cognition and conscious processing. Participants identified instances of "seen" or "unseen" in a detection task, a method known as post hoc sorting. Simulation data convincingly demonstrate that, counterintuitively, a conservative criterion inflates effect sizes of neural measures compared to a liberal criterion. While the impact of criterion shifts on effect size is suggested by signal detection theory, this study is the first to address this explicitly within the consciousness literature. Decoding analysis of data from two EEG experiments further shows that different criteria lead to differential effects on classifier performance in post hoc sorting. The findings underscore the pervasive influence of experimental design and participants report on neural measures of consciousness, revealing that criterion placement poses a critical challenge for researchers.

      Strengths and Weaknesses:

      One of the strengths of this study is the inclusion of the Perceptual Awareness Scale (PAS), which allows participants to provide more nuanced responses regarding their perceptual experiences. This approach ensures that responses at the lowest awareness level (selection 0) are made only when trials are genuinely unseen. This methodological choice is important as it helps prevent the overestimation of unconscious processing, enhancing the validity of the findings.

      A potential area for improvement in this study is the use of single time-points from peak decoding accuracy to generate current source density topography maps. While we recognize that the decoding analysis employed here differs from traditional ERP approaches, the robustness of the findings could be enhanced by exploring current source density over relevant time windows. Event-related peaks, both in terms of timing and amplitude, can sometimes be influenced by noise or variability in trial-averaged EEG data, and a time-window analysis might provide a more comprehensive and stable representation of the underlying neural dynamics.

      We thank reviewer 3 for their positive words and for taking the time to evaluate our manuscript. If we understand the reviewer correctly, he/she suggests that the signal-to-noise ratio could be improved by averaging over time windows rather than taking the values at singular peaks in time. Before addressing this suggestion, we would like to point out that we plotted the relevant effects across time in Supplementary Figure S1A and S1B. These show that the observed effects were not somehow limited in time, i.e. only occuring around the peaks, but that they consistenly occured throughout the time course of the trial. In line with this observation one might argue that the results could be improved further by averaging across windows of interest rather than taking the peak moments alone, as the reviewer suggests. Although this might be true, there are many analysis choices that one can make, each of which could have a positive (or negative) effect on the signal to noise ratio. For example, when taking a window of interest, one is faced with a new choice to make, this time regarding the number of consecutive samples to average across (i.e. the size of the window), etc. More generally there is a long list of choices that may affect the precise outcome of analyses, either positively or negatively. Having analyzed the data in one way, the problem with adding new analysis approaches is that there is no objective criterion for deciding which analysis would be ‘best’, other than looking at the outcome of the statistical analyses themselves. Doing this would constitute an explorative double-dipping-like approach to analyzing the results, which – aside from potentially increasing the signal to noise ratio – is likely to also result in an increase of the type I error rate. In the past, when the first author of this manuscript has attempted to minimize the number of statistical tests, he has lowered the number of EEG time points by simply taking the peaks (for example see https://doi.org/10.1073/pnas.1617268114), and that is the approach that was taken here as well. Given the above, we prefer not to further ‘try out’ additional analytical approaches on this dataset, simply to improve the results. We hope the reviewer sympathizes with our position that it is methodologically most sound to stick to the analyses we have already performed and reported, without further exploration.

      It is helpful that the authors show the standard error of the mean for the classifier performance over time. A similar indication of a measure of variance in other figures could improve clarity and transparency.

      That said, the paper appears solid regarding technical issues overall. The authors also do a commendable job in the discussion by addressing alternative paradigms, such as wagering paradigms, as a possible remedy to the criterion problem (Peters & Lau, 2015; Dienes & Seth, 2010). Their consideration of these alternatives provides a balanced view and strengthens the overall discussion.

      We thank the reviewer for this suggestion. Note that we already have a measure of variance in the other figures too, namely showing the connected data points of individual participants. Indvidual data points as a visualization of variance is preferred by many journals (e.g., see https://www.nature.com/documents/cr-gta.pdf), and also shows the spread of relevant differences when paired points are connected. For example, in Figure 2, 3 and 4, the relevant difference is between the liberal and conservative condition. When wanting to show the spread of the differences between these conditions, one option would be to first subtract the two measures in a pairwise fashion (e.g., liberal-conservative), and then plot the spread of those differences using some metric (e.g. standard error/CI of the mean difference). However, this has the disadvantage of no longer separately showing the raw scores on the conditions that are being compared. Showing conditions separately provides clarity to the reader about what is being compared to what. The most common approach to visualizing the variance of the relevant difference in such cases, is to plot the connected individual data points of all participants in the same plot. The uniformity of the slope of these lines in such a visualization provides direct insight into the spread of the relevant difference. Plotting the standard error of the mean on the raw scores of the conditions in these plots would not help, because this would not visualize the spread of the relevant difference (liberal-conservative). We therefore opted in the manuscript to show the mean scores on the conditions that we compare, while also showing the connected raw data points of individual participants in the same plot. One might argue that we should then use that same visualization in figure 3A, but note that this figure is merely intended to identify the peaks, i.e. it does not compare liberal to conservative. Furthermore, plotting the decoding time lines of individual participants would greatly diminish the clarity of this figure. Given our explanation, we hope the reviewer agrees with the approach that we chose, although we are of course open to modifying the figures if the reviewer has a suggestion for doing so while taking into account the points we raise here in our response.

      Impact of the Work:

      This study effectively demonstrates a phenomenon that has been largely unexplored within the consciousness literature. Subjective measures may not reliably capture the construct they aim to measure due to criterion confounds. Future research on neural measures of consciousness should account for this issue, and no-report measures may be necessary until the criterion problem is resolved.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors could further elaborate on the results of the PAS to provide a clearer insight into the impact of response criteria, which is notably more complex than in other experiments. Specifically, the results demonstrate that conservative response criterion condition displays a considerably higher sensitivity compared to those with a liberal response criterion. It would be interesting to explore whether this shift in sensitivity suggests a correlation between changes in response criteria and conscious experiences, and how the interaction between sensitivity and response criteria can affect the neural measure of consciousness.

      We thank the reviewer for this suggestion. Note that the change in sensitivity that we observed is minor compared to the change we observed in response criterion (hedges g criterion in exp 2 = 2.02, compared to hedges g sensitivity/d’ in exp 2 = 0.42). However, we do investigate the effect of sensitivity (disregarding response criterion) on decoding accuracy. To this end we devised Figure 3C (for the full decoding time course see Supplementary Figure S1B). These figures show that the small behavioral sensitivity effects observed in both experiments (hedges g sensitivity in exp 1 = 0.30, exp 2 = 0.42) did not translate into significant decoding differences between conservative and liberal in either experiment. This comes as no surprise given the small corresponding behavioral effects. Note that small sensitivity differences between liberal and conservative conditions are commonplace, plausibly driven by the fact that being liberal also involves being more noisy in one’s response tendencies (i.e. sometimes randomly indicating presence). Further, the reviewer suggests that we might correlate changes in response criteria to changes in conscious experience. The only relevant metric of conscious experience for which we have data in this manuscript is the Perceptual Awareness Scale (PAS), so we assume the reviewer asks for a correlation between experimentally induced changes in response criterion with the equivalent changes in d’. To this end we computed the difference in the PAS-based d’ metric between conservative and liberal, as well as the difference in the PAS-based criterion metric between conservative and liberal, and correlated these across subjects (N=26) using a Spearman rank correlation. The result shows that these metrics do not correlate r(24)=0.04, p=0.85. Note however that small-N correlations like these are only somewhat reliable for large effect sizes. An N of 26 and a mere power of 80% requires an effect size of at least r=0.5 to be detectable, so even if a correlation were to exist we may not have had enough power to detect it. Due to these caveats we opted to not report this null-correlation in the manuscript, but we are of course willing to do so if the reviewer and/or editor disagrees with this assessment.

    1. eLife Assessment

      The authors investigated the mechanisms underlying the pause in striatal cholinergic interneurons (SCINs) induced by thalamic input, identifying that Kv1 channels play a key role in this burst-dependent pause. The valuable study provides mechanistic insights into how burst activity in SCINs leads to a subsequent pause, highlighting the involvement of D1/D5 receptors. The experimental evidence is solid; however, the reviewers suggest further clarifying the mechanism by which clozapine reduces D5R ligand-independent activity in the L-DOPA-off state.

    2. Reviewer #1 (Public review):

      Summary:

      Tubert C. et al. investigated the role of dopamine D5 receptors (D5R) and their downstream potassium channel, Kv1, in the striatal cholinergic neuron pause response induced by thalamic excitatory input. Using slice electrophysiological analysis combined with pharmacological approaches, the authors tested which receptors and channels contribute to the cholinergic interneuron pause response in both control and dyskinetic mice (in the L-DOPA off state). They found that activation of Kv1 was necessary for the pause response, while activation of D5R blocked the pause response in control mice. Furthermore, in the L-DOPA off state of dyskinetic mice, the absence of the pause response was restored by the application of clozapine. The authors claimed that 1) the D5R-Kv1 pathway contributes to the cholinergic interneuron pause response in a phasic dopamine concentration-dependent manner, and 2) clozapine inhibits D5R in the L-DOPA off state, which restores the pause response.

      Strengths

      The electrophysiological and pharmacological approaches used in this study are powerful tools for testing channel properties and functions. The authors' group has well-established these methodologies and analysis pipelines. Indeed, the data presented were robust and reliable.

      Weaknesses:

      Although the paper has strengths in its methodological approaches, there is a significant gap between the presented data and the authors' claims.

      The authors answered the most of concerns I raised. However, the critical issue remains unresolved.

      I am still not convinced by the results presented in Fig. 6 and their interpretation. Since Clozapine acts as an agonist in the absence of an endogenous agonist, it may stimulate the D5R-cAMP-Kv1 pathway. Stimulation of this pathway should abolish the pause response mediated by thalamic stimulation in SCINs, rather than restoring the pause response. Clarification is needed regarding how Clozapine reduces D5R-ligand-independent activity in the absence of dopamine (the endogenous agonist). In addition, the author's argued that D5R antagonist does not work in the absence of dopamine, therefore solely D5R antagonist didn't restore the pause response. However, if D5R-cAMP-Kv1 pathway is already active in L-DOPA off state, why D5R antagonist didn't contribute to inhibition of D5R pathway?<br /> Since Clozapine is not D5 specific and Clozapine experiments were not concrete, I recommend testing whether other receptors, such as the D2 receptor, contribute to the Clozapine-induced pause response in the L-DOPA-off state.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Tubert et al. presents the role of D5 receptors (D5R) in regulating the striatal cholinergic interneuron (CIN) pause response through D5R-cAMP-Kv1 inhibitory signaling. Their findings provide a compelling model explaining the "on/off" switch of the CIN pause, driven by the distinct dopamine affinities of D2R and D5R. This mechanism, coupled with varying dopamine states, is likely critical for modulating synaptic plasticity in cortico-striatal circuits during motor learning and execution. Furthermore, the study bridges their previous finding of CIN hyperexcitability (Paz et al., Movement Disorder 2022) with the loss of the pause response in LID mice and demonstrates the restore of the pause through D1/D5 inverse agonism.

      Strengths:

      The study presents solid findings, and the writing is logically structured and easy to follow. The experiments are well-designed, properly combining ex vivo electrophysiology recording, optogenetics, and pharmacological treatment to dissect / rule out most, if not all, alternative mechanisms in their model.

      Weaknesses:

      While the manuscript is overall satisfying, one conceptual gap needs to be further addressed or discussed: the potential "imbalance" between D2R and D5R signaling due to the ligand-independent activity of D5R in LID. Given that D2R and D5R oppositely regulate CIN pause responses through cAMP signaling, investigating the role of D2R under LID off L-DOPA (e.g., by applying D2 agonists or antagonists, even together with intracellular cAMP analogs or inhibitors) could provide critical insights. Addressing this aspect would strengthen the manuscript in understanding CIN pause loss under pathological conditions.

    4. Reviewer #3 (Public review):

      Summary:

      Tubert et al. investigate the mechanisms underlying the pause response in striatal cholinergic interneurons (SCINs). The authors demonstrate that optogenetic activation of thalamic axons in the striatum induces burst activity in SCINs, followed by a brief pause in firing. They show that the duration of this pause correlates with the number of elicited action potentials, suggesting a burst-dependent pause mechanism. The authors demonstrated this burst-dependent pause relied on Kv1 channels. The pause is blocked by a SKF81297 and partially by sulpiride and mecamylamine, implicating D1/D5 receptor involvement. The study also shows that the ZD7288 does not reduce the duration of the pause, and that lesioning dopamine neurons abolishes this response, which can be restored by clozapine.

      Weaknesses:

      While this study presents an interesting mechanism for SCIN pausing after burst activity, there are several major concerns that should be addressed:

      (1) Scope of the Mechanism: It is important to clarify that the proposed mechanism may apply specifically to the pause in SCINs following burst activity. The manuscript does not provide clear evidence that this mechanism contributes to the pause response observed in behavioral animals. While the thalamus is crucial for SCIN pauses in behavioral contexts, the exact mechanism remains unclear. Activating thalamic input triggers burst activity in SCINs, leading to a subsequent pause, but this mechanism may not be generalizable across different scenarios. For instance, approximately half of TANs do not exhibit initial excitation but still pause during behavior, suggesting that the burst-dependent pause mechanism is unlikely to explain this phenomenon. Furthermore, in behavioral animals, the duration of the pause seems consistent, whereas the proposed mechanism suggests it depends on the prior burst, which is not aligned with in vivo observations. Additionally, many in vivo recordings show that the pause response is a reduction in firing rate, not complete silence, which the mechanism described here does not explain. Please address these in the manuscript.

      (2) Terminology: The use of "pause response" throughout the manuscript is misleading. The pause induced by thalamic input in brain slices is distinct from the pause observed in behavioral animals. Given the lack of a clear link between these two phenomena in the manuscript, it is essential to use more precise terminology throughout, including in the title, bullet points, and body of the manuscript.

      (3) Kv1 Blocker Specificity: It is unclear how the authors ruled out the possibility that the Kv1 blocker did not act directly on SCINs. Could there be an indirect effect contributing to the burst-dependent pause? Clarification on this point would strengthen the interpretation of the results.

      (4) Role of D1 Receptors: While it is well-established that activating thalamic input to SCINs triggers dopamine release, contributing to SCIN pausing (as shown in Figure 3), it would be helpful to assess the extent to which D1 receptors contribute to this burst-dependent pause. This could be achieved by applying the D1 agonist SKF81297 after blocking nAChRs and D2 receptors.

      (5) Clozapine's Mechanism of Action: The restoration of the burst-dependent pause by clozapine following dopamine neuron lesioning is interesting, but clozapine acts on multiple receptors beyond D1 and D5. Although it may be challenging to find a specific D5 antagonist or inverse agonist, it would be more accurate to state that clozapine restores the burst-dependent pause without conclusively attributing this effect to D5 receptors.

      Comments on revisions:

      The authors have addressed many of my concerns. However, I remain unconvinced that adding an 'ex vivo' experiment fully resolves the fundamental differences between the burst-dependent pause observed in slices - defined by the duration of a single AHP - and the pause response in CHINs observed in vivo, which may involve contributions from more than one prolonged AHP. In vivo, neurons can still fire action potentials during the pause, albeit at a lower frequency. Moreover, in behaving animals, pause duration does not vary with or without initial excitation. The mechanism proposed demonstrates that the pause duration, defined by the length of a single AHP, is positively correlated with preceding burst activity.

      To improve clarity, I recommend using the term 'SCIN pause' to describe the ex vivo findings, distinguishing them more explicitly from the 'pause response' observed in behaving animals. This distinction would help contextualize the ex vivo findings as potentially contributing to, but not fully representing, the pause response in vivo.

      Again, it would be helpful to present raw data for pause durations rather than relying solely on ratios. This approach would provide the audience with a clearer understanding of the absolute duration of the burst-dependent pause and allow for better comparison to the ~200 ms pause observed in behaving animals.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Tubert C. et al. investigated the role of dopamine D5 receptors (D5R) and their downstream potassium channel, Kv1, in the striatal cholinergic neuron pause response induced by thalamic excitatory input. Using slice electrophysiological analysis combined with pharmacological approaches, the authors tested which receptors and channels contribute to the cholinergic interneuron pause response in both control and dyskinetic mice (in the LDOPA off state). They found that activation of Kv1 was necessary for the pause response, while activation of D5R blocked the pause response in control mice. Furthermore, in the LDOPA off-state of dyskinetic mice, the absence of the pause response was restored by the application of clozapine. The authors claimed that (1) the D5R-Kv1 pathway contributes to the cholinergic interneuron pause response in a phasic dopamine concentration-dependent manner, and (2) clozapine inhibits D5R in the L-DOPA off state, which restores the pause response.

      Strengths:

      The electrophysiological and pharmacological approaches used in this study are powerful tools for testing channel properties and functions. The authors' group has well-established these methodologies and analysis pipelines. Indeed, the data presented were robust and reliable.

      Thank you for your comments.

      Weaknesses:

      Although the paper has strengths in its methodological approaches, there is a significant gap between the presented data and the authors' claims.

      There was no direct demonstration that the D5R-Kv1 pathway is dominant when dopamine levels are high. The term 'high' is ambiguous, and it raises the question of whether the authors believe that dopamine levels do not reach the threshold required to activate D5R under physiological conditions.

      We acknowledge that further work is necessary to clarify the role of the D5R in physiological conditions. While we haven’t found effects of the D1/D5 receptor antagonist SCH23390 on the pause response in control animals (Fig. 3), it is still possible that dopamine levels reach the threshold to stimulate D5R when burst firing of dopaminergic neurons contributes to dopamine release. We believe the pause response depends, among other factors, on the relative stimulation levels of SCIN D2 and D5 receptors, which is likely not an all-or-nothing phenomenon. To reduce ambiguity, we have eliminated the labels referring to dopamine levels in Figure 6F.

      Furthermore, the data presented in Figure 6 are confusing. If clozapine inhibits active D5R and restores the pause response, the D5R antagonist SCH23390 should have the same effect. The data suggest that clozapine-induced restoration of the pause response might be mediated by other receptors, rather than D5R alone.

      Thank you for letting us clarify this issue. Please note that the levels of endogenous dopamine 24 h after the last L-DOPA challenge in severe parkinsonian mice are expected to be very low. In the absence of an agonist, a pure D1/D5 antagonist would not exert an effect, as demonstrated with SCH23390 alone, which did not have an impact on the SCIN response to thalamic stimulation (Fig. 6). While clozapine can also act as a D1/D5 receptor antagonist, its D1/D5 effects in absence of an agonist are attributed to its inverse agonist properties (PMID: 24931197). Notably, SCH23390 prevented the effect of clozapine, allowing us to conclude that ligand-independent D1/D5 receptor-mediated mechanisms are involved in suppressing the pause response in dyskinetic mice. We now made it clearer in the third paragraph of the Discussion.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Tubert et al presents the role of the D5 receptor in modulating the striatal cholinergic interneuron (CIN) pause response through D5R-cAMP-Kv1 inhibitory signaling. Their model elucidates the on / off switch of CIN pause, likely due to the different DA affinity between D2R and D5R. This machinery may be crucial in modulating synaptic plasticity in cortical-striatal circuits during motor learning and execution. Furthermore, the study bridges their previous finding of CIN hyperexcitability (Paz et al., Movement Disorder 2022) with the loss of pause response in LID mice.

      Strengths:

      The study had solid findings, and the writing was logically structured and easy to follow. The experiments are well-designed, and they properly combined electrophysiology recording, optogenetics, and pharmacological treatment to dissect/rule out most, if not all, possible mechanisms in their model.

      Thank you for your comments.

      Weaknesses:

      The manuscript is overall satisfying with only some minor concerns that need to be addressed. Manipulation of intracellular cAMP (e.g. using pharmacological analogs or inhibitors) can add additional evidence to strengthen the conclusion.

      Thank you for the suggestion. While we acknowledge that we are not providing direct evidence of the role of cAMP, we chose not to conduct these experiments because cAMP levels influence several intrinsic and synaptic currents beyond Kv1, significantly affecting  membrane oscillations and spontaneous firing, as shown in Paz et al. 2021. However, we are modifying the fourth paragraph of the Discussion so there is no misinterpretation about our findings in the current work.

      Reviewer #3 (Public review):

      Summary:

      Tubert et al. investigate the mechanisms underlying the pause response in striatal cholinergic interneurons (SCINs). The authors demonstrate that optogenetic activation of thalamic axons in the striatum induces burst activity in SCINs, followed by a brief pause in firing. They show that the duration of this pause correlates with the number of elicited action potentials, suggesting a burst-dependent pause mechanism. The authors demonstrated this burst-dependent pause relied on Kv1 channels. The pause is blocked by an SKF81297 and partially by sulpiride and mecamylamine, implicating D1/D5 receptor involvement. The study also shows that the ZD7288 does not reduce the duration of the pause and that lesioning dopamine neurons abolishes this response, which can be restored by clozapine.

      Weaknesses:

      While this study presents an interesting mechanism for SCIN pausing after burst activity, there are several major concerns that should be addressed:

      (1) Scope of the Mechanism:

      It is important to clarify that the proposed mechanism may apply specifically to the pause in SCINs following burst activity. The manuscript does not provide clear evidence that this mechanism contributes to the pause response observed in behavioral animals. While the thalamus is crucial for SCIN pauses in behavioral contexts, the exact mechanism remains unclear. Activating thalamic input triggers burst activity in SCINs, leading to a subsequent pause, but this mechanism may not be generalizable across different scenarios. For instance, approximately half of TANs do not exhibit initial excitation but still pause during behavior, suggesting that the burst-dependent pause mechanism is unlikely to explain this phenomenon. Furthermore, in behavioral animals, the duration of the pause seems consistent, whereas the proposed mechanism suggests it depends on the prior burst, which is not aligned with in vivo observations. Additionally, many in vivo recordings show that the pause response is a reduction in firing rate, not complete silence, which the mechanism described here does not explain. Please address these in the manuscript.

      Thank you for your valuable feedback. While the absence of an initial burst in some TANs in vivo may suggest the involvement of alternative or additional mechanisms, this does not exclude a participation of Kv1 currents. We have seen that subthreshold depolarizations induced by thalamic inputs are sufficient to produce an afterhyperpolarization (AHP) mediated by Kv1 channels (see Tubert et al., 2016, PMID: 27568555). Although such subthreshold depolarizations are not captured in current recordings from behaving animals, intracellular in vivo recordings have demonstrated an intrinsically generated AHP after subthreshold depolarization of SCIN caused by stimulation of excitatory afferents (PMID: 15525771). Additionally, when pause duration is plotted against the number of spikes elicited by thalamic input (Fig. 1G), we found that one elicited spike is followed by an interspike interval 1.4 times longer than the average spontaneous interspike interval. We acknowledge the potential involvement of additional factors, including a decrease of excitatory thalamic input coinciding with the pause, followed by a second volley of thalamic inputs (Fig. 1J-K, after observations by Matsumoto et al., 2001- PMID: 11160526), as well as the timing of elicited spikes relative to ongoing spontaneous firing (Fig. 1D-E). Dopaminergic modulation (Fig. 3) and regional differences among striatal regions (PMID: 24559678) may also contribute to the complexity of these dynamics. 

      (2) Terminology:

      The use of "pause response" throughout the manuscript is misleading. The pause induced by thalamic input in brain slices is distinct from the pause observed in behavioral animals. Given the lack of a clear link between these two phenomena in the manuscript, it is essential to use more precise terminology throughout, including in the title, bullet points, and body of the manuscript.

      While we acknowledge that our study does not include in vivo evidence, we believe ex vivo preparations have been instrumental in elucidating the mechanisms underlying the responses observed in vivo. We also agree with previous ex vivo studies in using consistent terminology. However, we will clarify the ex vivo nature of our work in the abstract and bullet points for greater transparency.

      (3) Kv1 Blocker Specificity:

      It is unclear how the authors ruled out the possibility that the Kv1 blocker did not act directly on SCINs. Could there be an indirect effect contributing to the burst-dependent pause? Clarification on this point would strengthen the interpretation of the results.

      Thank you for letting us clarify this issue. In our previous work (Tubert et al., 2016) we showed that the Kv1.3 and Kv1.1 subunits are selectively expressed in SCIN throughout the striatum. Moreover, gabaergic transmission is blocked in our preparations. We are including a phrase to make it clearer in the manuscript (Results section, subheading “The pause response to thalamic stimulation requires activation of Kv1 channels”).

      (4) Role of D1 Receptors:

      While it is well-established that activating thalamic input to SCINs triggers dopamine release, contributing to SCIN pausing (as shown in Figure 3), it would be helpful to assess the extent to which D1 receptors contribute to this burst-dependent pause. This could be achieved by applying the D1 agonist SKF81297 after blocking nAChRs and D2 receptors.

      Thank you for letting us clarify this point. We show that blocking D2R or nAChR reduces the pause only for strong thalamic stimulation eliciting 4 SCIN spikes (Figure 3G), whereas the D1/D5 agonist SKF81297 is able to reduce the pause induced by weaker stimulation as well (Figure 3C). In addition, the D1/D5 receptor antagonist SCH23390 does not modify the pause response (Figure 3C). This may indicate that nAChR-mediated dopamine release induced by thalamic-induced bursts more efficiently activates D2R compared to D5R. We speculate that, in this context, lack of D5R activation may be necessary to keep normal levels of Kv1.3 currents necessary for SCIN pauses.

      (5) Clozapine's Mechanism of Action:

      The restoration of the burst-dependent pause by clozapine following dopamine neuron lesioning is interesting, but clozapine acts on multiple receptors beyond D1 and D5.

      Although it may be challenging to find a specific D5 antagonist or inverse agonist, it would be more accurate to state that clozapine restores the burst-dependent pause without conclusively attributing this effect to D5 receptors.

      Thank you for your insightful observation. We acknowledge the difficulty of targeting dopamine receptors pharmacologically due to the lack of highly selective D1/D5 inverse agonists. We used SCH23390, which is a highly selective D1/D5 receptor antagonist devoid of inverse agonist effects, to block clozapine’s ability to restore SCIN pauses (Figure 6C). This indicates that the restoration of SCIN pauses by clozapine depends on D1/D5 receptors. Furthermore, in a previous study, we demonstrated that clozapine’s effect on restoring SCIN excitability in dyskinetic mice (a phenomenon mediated by Kv1 channels in SCIN; Tubert et al., 2016) was not due to its action on serotonin receptors (Paz, Stahl et al., 2022). While our data do not rule out the potential contribution of other receptors, such as muscarinic acetylcholine receptors, we believe they strongly support the role of D1/D5 receptors. To reflect this, we added a statement discussing the potential contribution of receptors beyond D1/D5 in the last paragraph of the Discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The effect of MgTx was not consistent with the previous study (Tubert, 2016). I expected MgTx to increase the basal firing rate of cholinergic interneurons.

      Thank you for highlighting this. In our previous study we used ACSF in the recording pipette, instead of the intracellular solution -higher in potassium- used in the present study. This is likely related to the higher spontaneous firing rates of SCIN observed in the present study, which made the SCIN response stand out. In addition, our previous study analyzed the effect of MgTx on spontaneous firing frequency of SCIN isolated from major circuit regulation by adding CNQX and picrotoxin to the bath, while in this study we needed to preserve the thalamic input and only picrotoxin in the bath was used. Given these differences, the two conditions are not strictly comparable but rather give complementary information.

      (2) In the text, the authors claim that "SCINs recorded in the parkinsonian OFF-L-DOPA condition show an increase in membrane excitability that mimics changes acutely induced by SKF81297 in SCINs from control mice." However, the data for SKF81297 do not support this claim.

      We modified the text to make it clearer that the cited phrase refers to a previous publication (PMID: 35535012) in which SCIN intrinsic excitability was characterized via analysis of responses to somatic current injection in whole-cell recordings. In the present study Fig. 3D shows SKF81297 effects on interspike intervals during spontaneous activity with a trend towards increased firing, and Fig. 4E a lack of effect on “burst duration” for responses with different numbers of spikes elicited by thalamic afferent stimulation. 

      (3) I recommend testing whether other receptors, such as D2R, contribute to the clozapineinduced pause response in the L-DOPA off state.

      Thank you for your suggestion. We acknowledge that studying the role of D2R is important. However, our preliminary data suggest that a comprehensive follow up study, which is beyond the scope of this manuscript, is necessary to understand their role. 

      Reviewer #2 (Recommendations for the authors):

      (1) For Figure 1D-E, I understand that the authors are trying to state that the previous spontaneous spike contributes to a hyperpolarized window that induces a delay in the evoked spikes. However, it is almost impossible to discriminate between spontaneous and evoked spikes in this experiment. Furthermore, considering the tonic firing property, I highly suspect that even a sham control design (no optogenetic light) will give you a similar distribution as in Figure 1E (the longer IN X1, the shorter in X2).

      We agree that some spikes following stimulus onset may have occurred independently of the light stimulus, as it is also possible during behavioral tasks. We used the baseline recordings to estimate the effects of a sham stimulus as requested and included the data in Fig. 1E-F. As expected, the sham stimulation data showed a similar inverse relationship with the time elapsed from the preceding spike, but latencies were longer than with the stimulus (except for times close to the average ISI), suggesting that the optical stimulation increased the probability of evoking a spike (Fig. 1F). Remarkably, the pause following this threshold stimulation was significantly longer than the baseline ISI, as reported in the main text (Results section, last sentence of first paragraph).

      (2) The authors used optogenetics to induce thalamic inputs to induce the pause after bursts. Considering CINs also receive inputs from different brain regions, e.g. cortex, does this phenomena/pause after bursts also exist following cortical inputs?

      We did not study the SCIN response to cortical inputs, but both thalamic and cortical inputs seem to drive SCIN pause-responses as observed by others (PMID: 24553950).  

      (3) The effect of the D5R inverse agonism, and the further combined with D5R agonist and antagonist, faithfully reveal/confirm the increase of ligand-independent activity of D5R in LID reported previously. It would be ideal to also directly modulate intracellular cAMP (as in the 2022 paper) to confirm the rescue effects on the CIN pause response.

      Please, see our response in the public review.

      (4) In healthy conditions, the balance between D2R and D5R signaling (shown in Figure 6F left) switches the pause and no pause modes which potentially contributes to cortical-striatal plasticity. How about in LID off L-DOPA condition? Is it possible to rescue/modulate the pause on/off mode by D2R agonism in LID?

      We haven’t tested the effect of D2 agonists yet, but this is scheduled for follow up studies. 

      Reviewer #3 (Recommendations for the authors):

      (1) The authors use the ratio of pause duration to baseline ISI to describe the pause, which is useful for detecting significant differences. However, it would be beneficial to also report the actual duration of the burst-dependent pause to provide readers with a clearer understanding of the variation in pauses.

      In all figures we report the average baseline ISI duration for each experiment / experimental condition, allowing readers to estimate actual pause durations. We added in the main text actual average pause durations corresponding to Fig. 1H, which are representative of those observed along the study.

      (2) In Figure 3D, a more detailed comparison would be helpful, as there appears to be a significant difference between the SKF81297 group and others.

      We acknowledge that there might be a trend towards reduced ISIs, however, it was statistically non-significant (see legend of figure 3). In addition, the effect of SKF81297 seems unrelated to this trend, as its effect is also seen under the effect of ZD7288, which substantially prolongs the baseline ISI (Fig. 4E-F).

    1. eLife Assessment

      This useful study investigates the impact of disrupting the interaction of RAS with the PI3K subunit p110α in macrophage function in vitro and inflammatory responses in vivo. Solid data overall supports a role for RAS-p110α signalling in regulating macrophage activity and so inflammation, however for many of the readouts presented the magnitude of the phenotype is not particularly pronounced. Further analysis would be required to substantiate the claims that RAS-p110α signalling plays a key role in macrophage function. Of note, the molecular mechanisms of how exactly p110α regulates the functions in macrophages have not yet been established.

    2. Reviewer #2 (Public review):

      Summary:

      Cell intrinsic signaling pathways controlling the function of macrophages in inflammatory processes, including in response to infection, injury or in the resolution of inflammation are incompletely understood. In this study, Rosell et al. investigate the contribution of RAS-p110α signaling to macrophage activity. p110α is a ubiquitously expressed catalytic subunit of PI3K with previously described roles in multiple biological processes including in epithelial cell growth and survival, and carcinogenesis. While previous studies have already suggested a role for RAS-p110α signaling in macrophage function, the cell intrinsic impact of disrupting the interaction between RAS and p110α in this central myeloid cell subset is not known.

      Strengths:

      Exploiting a sound previously described genetically engineered mouse model that allows tamoxifen-inducible disruption of the RAS-p110α pathway and using different readouts of macrophage activity in vitro and in vivo, the authors provide data consistent with their conclusion that alteration in RAS-p110α signaling impairs various but selective aspects of macrophage function in a cell-intrinsic manner.

      Weaknesses:

      My main concern is that for various readouts, the difference between wild-type and mutant macrophages in vitro or between wild-type and Pik3caRBD mice in vivo is modest, even if statistically significant. To further substantiate the extent of macrophage function alteration upon disruption of RAS-p110α signaling and its impact on the initiation and resolution of inflammatory responses, the manuscript would benefit from a more extensive assessment of macrophage activity and inflammatory responses in vivo.

      In the in vivo model, all cells have disrupted RAS-p100α signaling, not only macrophages. Given that other myeloid cells besides macrophages contribute to the orchestration of inflammatory responses, it remains unclear whether the phenotype described in vivo results from impaired RAS-p100α signaling within macrophages or from defects in other haematopoietic cells such as neutrophils, dendritic cells, etc.

      Inclusion of information on the absolute number of macrophages, and total immune cells (e.g. for the spleen analysis) would help determine if the reduced frequency of macrophages represents an actual difference in their total number or rather reflects a relative decrease due to an increase in the number of other/s immune cell/s.

      Comments on revisions:

      I thank the authors for addressing my comments.<br /> - I believe that additional in vivo experiments, or the inclusion of controls for the specificity of the inhibitor, which the authors argue are beyond the scope of the current study, are essential to address the weaknesses and limitations stated in my current evaluation.<br /> - While the neutrophil depletion suggests neutrophils are not required for the phenotype, there are multiple other myeloid cells, in addition to macrophages, that could be contributing or accounting for the in vivo phenotype observed in the mutant strain (not macrophage specific).<br /> - Inclusion of absolute cell numbers (in addition to the %) is essential. I do not understand why the authors are not including these data. Have they not counted the cells?<br /> - Lastly, inclusion of representatives staining and gating strategies for all immune profiling measurements carried out by flow cytometry is important. This point has not been addressed, not even in writing.

    1. eLife Assessment

      This manuscript reports important findings that the methyltransferase METTL3 is involved in the repair of abasic sites and uracil in DNA, mediating resistance to floxuridine-driven cytotoxicity. The presented evidence is conclusive for the involvement of m6A in DNA involving single cell imaging and mass spectrometry data. The authors present convincing evidence that the m6A signal does not result from bacterial contamination or RNA.

    2. Reviewer #1 (Public review):

      Summary:

      The authors sought to identify unknown factors involved in the repair of uracil in DNA through a CRISPR knockout screen.

      Strengths:

      The screen identified both known and unknown proteins involved in DNA repair resulting from uracil or modified uracil base incorporation into DNA. The conclusion is that the protein activity of METTL3, which converts A nucleotides to 6mA nucleotides, plays a role in the DNA damage/repair response. The importance of METTL3 in DNA repair, and its colocalization with a known DNA repair enzyme, UNG2, is well characterized.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors performed a CRISPR knockout screen in the presence of floxuridine, a chemotherapeutic agent that incorporates uracil and fluoro-uracil into DNA, and identified unexpected factors, such as the RNA m6A methyltransferase METTL3, as required to overcome floxuridine-driven cytotoxicity in mammalian cells. Interestingly, the observed N6-methyladenosine was embedded in DNA, which has been reported as DNA 6mA in mammalian genomes and is currently confirmed with mass spectrometry in this model. Therefore, this work consolidated the functional role of mammalian genomic DNA 6mA, and supported with solid evidence to uncover the METTL3-6mA-UNG2 axis in response to DNA base damage.

      Strengths:

      In this work, the authors took an unbiased, genome-wide CRISPR approach to identify novel factors involved in uracil repair with potential clinical interest.

      The authors designed elegant experiments to confirm the METTL3 works through genomic DNA, adding the methylation into DNA (6mA) but not the RNA (m6A), in this base damage repair context. The authors employ different enzymes, such as RNase A, RNase H, DNase, and liquid chromatography coupled to tandem mass spectrometry to validate that METTL3 deposits 6mA in DNA in response to agents that increase genomic uracil.

      They also have the Mettl3-KO and the METTL3 inhibition results to support their conclusion.

      Weaknesses:

      The authors used the METTL3 inhibitor and Mettl3-KO to validate the METTL3-6mA-UNG2 functional roles. While not an outright weakness, rescue experiments of the KO line with wild type and the METTL3 catalytic mutant would have further strengthened the evidence.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment  

      This manuscript reports important findings that the methyltransferase METTL3 is involved in the repair of abasic sites and uracil in DNA, mediating resistance to floxuridine-driven cytotoxicity. Convincing evidence shows the involvement of m6A in DNA based on single cell imaging and mass spec data. The authors present evidence that the m6A signal does not result from bacterial contamination or RNA, but the text does not make this overly clear.

      We thank the editors for recognizing the importance of our work and the relevance of METTL3 and 6mA in DNA repair. We agree the evidence presented can be regarded as convincing, in that it includes validation with orthogonal approaches and excludes the source of 6mA being RNA or bacterial contamination.

      To clarify, the identification of 6mA in DNA, upon DNA damage, is based first on immunofluorescence observations using an anti-m6A antibody. In this setting, removal of RNA with RNase treatment fails to reduce the 6mA signal, excluding the possibility that the source of signal is RNA. In contrast, removal of DNA with DNase treatment removes all 6mA signal, strongly suggesting that the species carrying the N6-methyladenosine modification is DNA (Figure 3D, E). Importantly, in Figure 3F, G, we provide orthogonal, quantitative mass spectrometry data that independently confirm this finding. Mass spectrometry-liquid chromatography of DNA analytes, conclusively shows the presence of 6mA in DNA upon treatment with DNA damaging agents and excludes that the source is RNA, based on exact mass. 

      Cells only show the 6mA signal when treated with DNA damaging agents, and the 6mA is absent from untreated cells (Figure 3D, E, H, I). This provides strong evidence that the 6mA signal is not a result of bacterial contamination in our cell lines. Furthermore, our cell lines are routinely tested for mycoplasma contamination. It could be possible that stock solutions of DNA damaging agents may be contaminated, but this would need to be true for all individual drugs and stocks tested, which is highly unlikely. Moreover, the data showing 6mA signal is not significantly different from untreated cells when a DNA damaging agent is combined with a METTL3 inhibitor (Figure 3H, I) provides strong evidence against bacterial contamination in our stocks.  

      In summary, we provide conclusive evidence, based on orthogonal methods, that the METTL3-dependent N6-methyladenosine modification is deposited in DNA, not RNA, in response to DNA damage and have now clarified these points in the results and discussion. 

      Public Reviews:  

      Reviewer #1 (Public review):  

      Summary:  

      The authors sought to identify unknown factors involved in the repair of uracil in DNA through a CRISPR knockout screen.  

      Strengths:  

      The screen identified both known and unknown proteins involved in DNA repair resulting from uracil or modified uracil base incorporation into DNA. The conclusion is that the protein activity of METTL3, which converts A nucleotides to 6mA nucleotides, plays a role in the DNA damage/repair response. The importance of METTL3 in DNA repair, and its colocalization with a known DNA repair enzyme, UNG2, is well characterized.  

      Weaknesses:  

      This reviewer identified no major weaknesses in this study. The manuscript could be improved by tightening the text throughout, and more accurate and consistent word choice around the origin of U and 6mA in DNA. The dUTP nucleotide is misincorporated into DNA, and 6mA is formed by methylation of the A base present in DNA. Using words like 6mA "deposition in DNA" seems to imply it results from incorporation of a methylated dATP nucleotide during DNA synthesis.  

      The increased presence of 6mA during DNA damage could result from methylation at the A base itself (within DNA) or from incorporation of pre-modified 6mA during DNA synthesis. Our data do not directly discriminate between these two mechanisms, and we clarified this point in the discussion.  

      Reviewer #2 (Public review):  

      Summary:  

      In this work, the authors performed a CRISPR knockout screen in the presence of floxuridine, a chemotherapeutic agent that incorporates uracil and fluoro-uracil into DNA, and identified unexpected factors, such as the RNA m6A methyltransferase METTL3, as required to overcome floxuridine-driven cytotoxicity in mammalian cells. Interestingly, the observed N6-methyladenosine was embedded in DNA, which has been reported as DNA 6mA in mammalian genomes and is currently confirmed with mass spectrometry in this model. Therefore, this work consolidated the functional role of mammalian genomic DNA 6mA, and supported with solid evidence to uncover the METTL3-6mA-UNG2 axis in response to DNA base damage.  

      Strengths:  

      In this work, the authors took an unbiased, genome-wide CRISPR approach to identify novel factors involved in uracil repair with potential clinical interest.  

      The authors designed elegant experiments to confirm the METTL3 works through genomic DNA, adding the methylation into DNA (6mA) but not the RNA (m6A), in this base damage repair context. The authors employ different enzymes, such as RNase A, RNase H, DNase, and liquid chromatography coupled to tandem mass spectrometry to validate that METTL3 deposits 6mA in DNA in response to agents that increase genomic uracil.  

      They also have the Mettl3-KO and the METTL3 inhibition results to support their conclusion.  

      Weaknesses:  

      Although this study demonstrates that METTL3-dependent 6mA deposition in DNA is functionally relevant to DNA damage repair in mammalian cells, there are still several concerns and issues that need to be improved to strengthen this research.  

      First, in the whole paper, the authors never claim or mention the mammalian cell lines contamination testing result, which is the fundamental assay that has to be done for the mammalian cell lines DNA 6mA study.  

      Our cell lines are routinely tested for bacterial contamination, specifically mycoplasma, and we state this information in the revised manuscript. 

      Importantly, we do not observe 6mA in untreated cells, strongly suggesting that the 6mA signal observed is dependent on the presence of DNA damage and not caused by contamination in the cell lines (Figure 3D, E, H, I). While it could be possible that stock solutions of DNA damaging agents may be contaminated, this would need to be the case for all individual drugs and stocks tested that induce 6mA, which is very unlikely. Finally, the data showing 6mA signal is not significantly different from untreated cells when a DNA damaging agent is combined with a METTL3 inhibitor (Figure 3 H, I) provides strong evidence against bacterial contamination in our drug stocks.  

      Second, in the whole work, the authors have not supplied any genomic sequencing data to support their conclusions. Although the sequencing of DNA 6mA in mammalian models is challenging, recent breakthroughs in sequencing techniques, such as DR-Seq or NT/NAME-seq, have lowered the bar and improved a lot in the 6mA sequencing assay. Therefore, the authors should consider employing the sequencing methods to further confirm the functional role of 6mA in base repair.  

      While we agree that it could be important to understand the precise genomic location of 6mA in relation to DNA damage, this is outside the scope of the current study. Moreover, this exercise may prove unproductive. If 6mA is enriched in DNA at damage sites or as DNA is replicated, the genomic mapping of 6mA is likely to be stochastic. If stochastic, it would be impossible to obtain the read depth necessary to map 6mA accurately. 

      Third, the authors used the METTL3 inhibitor and Mettl3-KO to validate the METTL36mA-UNG2 functional roles. However, the catalytic mutant and rescue of Mettl3 may be the further experiments to confirm the conclusion.  

      We believe this to be an excellent suggestion from Reviewer #2 but we are unable to perform the proposed experiment at this time. We encourage future studies to explore the rescue experiment.  

      Reviewer #3 (Public review):  

      Summary:  

      The authors are showing evidence that they claim establishes the controversial epigenetic mark, DNA 6mA, as promoting genome stability.  

      Strengths:  

      The identification of a poorly understood protein, METTL3, and its subsequent characterization in DDR is of high quality and interesting.  

      Weaknesses:  

      (1) The very presence of 6mA (DNA) in mammalian DNA is still highly controversial and numerous studies have been conclusively shown to have reported the presence of 6mA due to technical artifacts and bacterial contamination. Thus, to my knowledge there is no clear evidence for 6mA as an epigenetic mark in mammals, and consequently, no evidence of writers and readers of 6mA. None of this is mentioned in the introduction. Much of the introduction can be reduced, but a paragraph clearly stating the controversy and lack of evidence for 6mA in mammals needs to be added, otherwise, the reader is given an entirely distorted view of the field.  

      These concerns must also be clearly in the limitations section and even in the results section which fails to nuance the authors' findings. 

      We agree with the reviewer that the presence and potential function of 6mA in mammalian DNA has been debated. Importantly, the debate regarding the presence and quantity of 6mA in DNA has been previously restricted to undamaged, baseline conditions. In complete agreement with this notion, we do not detect appreciable levels of 6mA in untreated cells. We revised the introduction section to present the debate about 6mA in DNA. We, however, want to highlight that our study provides, for the first time, convincing evidence (based on two orthogonal methods) that 6mA is present in DNA in response to a stimulus, DNA damage. We do not claim or provide any data that suggest 6mA is a baseline epigenetic mark.  

      (2) What is the motivation for using HT-29 cells? Moreover, the materials and methods do not state how the authors controlled for bacterial contamination, which has been the most common cause of erroneous 6mA signals to date. Did the authors routinely check for mycoplasma? 

      HT-29 is a cell line of colorectal origin and chemotherapeutic agents that introduce uracil and uracil derivatives in DNA, as those used in this study, are relevant for the treatment of colorectal cancer. As indicated above, we do not observe 6mA in untreated cells, strongly suggesting that the 6mA signal observed is dependent on DNA damage and not caused by a potential bacterial contamination (Figure 3D, E, H, I). Additionally, our cell lines are routinely tested for bacterial contamination, specifically mycoplasma. 

      (3) The single cell imaging of 6mA in various cells is nice. The results are confirmed by mass spec as an orthogonal approach. Another orthogonal and quantitative approach to assessing 6mA levels would be PacBio. Similarly, it is unclear why the authors have not performed dot-blots of 6mA for genomic DNA from the given cell lines.

      We are confused by this point since an orthogonal approach to detect 6mA, mass spectrometry-liquid chromatography, was employed. This method does not use an antibody and confirms the increase of 6mA in DNA when cells were treated with DNA damaging agents. This data is presented in Figure 3F, G. 

      It is sensible to hypothesize that the localization of 6mA is consistent with DNA replication (like uracil deposition). In this event, the genomic mapping of 6mA is likely to be stochastic. This would make quantification with PacBio sequencing difficult because it would be very challenging to achieve the appropriate read depth to call a modified base. 

      Dot blots rely on an antibody and thus are not truly orthogonal to our immunofluorescence-based measurements. We preferred the mass spectrometry-liquid chromatography approach we took as a true orthogonal approach.  

      (4) The results of Figure 3 need further investigation and validation. If the results are correct the authors are suggesting that the majority of 6mA in their cell lines is present in the DNA, and not the RNA, which is completely contrary to every other study of 6mA in mammalian cells that I am aware of. This could suggest that the antibody is not, in fact, binding to 6mA, but to unmodified adenine, which would explain why the signal disappears after DNAse treatment. Indeed, binding of 6mA to unmethylated DNA is a commonly known problem with most 6mA antibodies and is well described elsewhere.  

      Based on this and the following comment, we are convinced that Reviewer #3 has overlooked two critical elements of our study:

      First, the immunofluorescence work presented in Figure 3, showing 6mA signal in response to DNA damage, uses cells that were pre-extracted to remove excess cytoplasmic RNA. This method is often used in immunofluorescence experiments of this kind. The pre-extraction method removes most of the cytoplasmic content, and the majority of the cytoplasmic m6A RNA signal. Supplementary Figure 3D shows cells that have not been pre-extracted prior to staining. These images show the cytoplasmic m6A signal is abundant if we do not perform the pre-extraction step. 

      If the antibody used to label 6mA significantly reacted with unmodified adenine, we would expect a large signal in untreated or untreated and denatured conditions. In contrast, an increase in 6mA is not observed in either case.

      Second, the orthogonal approach we employed, mass spectrometry coupled with liquid chromatography, measures 6mA DNA analytes specifically by exact mass. This approach does not depend on an antibody and yields results consistent with those from the immunofluorescence experiments. 

      (5) Given the lack of orthologous validation of the observed DNA 6mA and the lack of evidence supporting the presence of 6mA in mammalian DNA and consequently any functional role for 6mA in mammalian biology, the manuscript's conclusions need to be toned down significantly, and the inherent difficulty in assessing 6mA accurately in mammals acknowledged throughout.  

      As discussed in response to prior comments, Figure 3 does provide two independent and orthologous methods that demonstrate 6mA presence in DNA specifically, and not RNA, in response to DNA damage. Complementary and orthogonal datasets are presented using either immunofluorescence microscopy or mass spectrometry-liquid chromatography of extracted DNA. The latter method does not rely on an antibody and can discriminate 6mA DNA versus RNA based on exact mass. We revised the text to clarify that Figure 3F, G is a completely orthogonal approach. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):  

      The authors cited most of the related publications; however, the reviewer suggested that three 2015 papers in Cell (Dahua Chen's, Yang Shi's, and Chuan He's) and the 2016 Nature (Andrew Xiao's) article are worth citing here because those are the milestone works reported the genomic DNA 6mA, for the first wave, in eukaryotic and mammalian genomes.  

      Furthermore, in Tao P. Wu and Andrew Z. Xiao's 2016 Nature article, the result has already emphasized the genomic DNA 6mA is enriched in the H2A.X sites; therefore, that work indicated the link between DNA damage and repair and 6mA's functional role. The authors may add some comments or discussion on this point.  

      Last but not least, the authors may also need to discuss the reported evidence of DNA 6mA's function in mitochondria.  

      We thank the reviewer for these suggestions. We revised our introduction and include additional references and discussion points, as suggested by the reviewer. 

      Reviewer #3 (Recommendations for the authors):  

      Minor points:  

      (1) In general, the manuscript is too verbose, and the amount of text can be dramatically reduced/sharpened. The introduction in particular is too long. 

      We revised the manuscript and reduced text when appropriate.

      (2) Each results section can also be condensed to improve clarity significantly. Indeed the results section reads like a 'Result & Discussion' section, which is then followed by a Discussion. Maybe the discussion section can be shortened to a 'conclusion'.

      We revised the results section when appropriate and reworked the discussion.

      Importantly, we revised the text related to Figure 3 as it does appear that Reviewer #3 did not appreciate key results present in this figure, specifically the orthogonal, mass spectrometry approach validating the discovery of 6mA DNA species (Figure 3F, G). We added a schematic as Figure 3F to further clarify this point as well. 

      (3) The accession number for sequencing data in GEO data should be provided.  

      The accession numbers is now provided in the manuscript. GSE282260.

      (4) All figures are unnecessarily small and in some cases, supporting figures from the supplementary data should be moved into the main figure to improve clarity. 

      The figures are of high image quality and can be enlarged easily. If there are specific figures that the reviewer believes will improve clarity, we would be happy to move them.

    1. eLife Assessment

      This study makes a valuable contribution to understanding Bayesian inference in dynamic environments by demonstrating how humans integrate prior beliefs with sensory evidence, revealing an overestimation of environmental volatility while accurately tracking noise. The evidence is solid, supported by robust model fitting and principled factorial model set analyses, though limitations in sample size and inconclusive findings on memory capacity tradeoffs reduce the overall impact. Future work should expand validation across datasets, enhance model comparisons, and explore the generalizability of reduced Bayesian frameworks to strengthen the conclusions and broader relevance of the study.

    2. Reviewer #1 (Public review):

      Summary

      Behavioural adjustments to different sources of uncertainty remain a hot topic in many fields including reinforcement learning. The authors present valuable findings suggesting that human participants integrate prior beliefs with sensory evidence to improve their predictions in dynamically changing environments involving perceptual decision-making, pinpointing to hallmarks of Bayesian inference. Fitting of a reduced Bayesian model to participant choice behaviour reveals that decision-makers overestimate environmental volatility, but were reasonably accurate in terms of tracking environmental noise.

      Strengths

      Using a perceptual decision-making task in which participants were presented with sequences of noisy observation in environments with constant volatility and variable noise, the authors demonstrate solid evidence in favour of reduced Bayesian models that can account for participant choice behaviour when its generative parameters are fitted freely. The work nicely complements recent work demonstrating the fitting of a full Bayesian model to human reinforcement learning. The authors' approach to the fitting of the model in a principled/factorial manner that is exhaustive performs the model comparison and highlights the need for further work in evaluating the model's performance in environments outside of its generative parameters. Overall the work further highlights the utility of using perceptual decision-making for Bayesian inference questions.

      Weaknesses

      Although data sharing and reanalysis of data are extremely welcome, particularly considering their utility for open science, the small sample size (N= 29) of the original dataset somewhat restricts the authors' ability to show more conclusive findings when it comes to deciphering the optimal memory capacity of the fitted models. It is likely that the relatively small sample size also contributes to certain key hypotheses not being confirmed intuitively, for example, the expected negative relationship between hazard rates and log (noise). The notion that the participants rely on priors to a greater extent in low noise environments relative to high noise may also indicate that they might misattribute noise as volatility, as higher noise in the environment usually obscures the information content of outcomes, and in the case of pure random/noisy sequences, it should increase reliance to priors as new sensory evidence becomes unreliable.

    3. Reviewer #2 (Public review):

      Summary:

      Meijer et al reanalyze behavioral data from a task in which people made predictions about the next in a sequence of localized sounds with the goal of understanding the computations through which people combine sensory experiences into a prior used for perception. The authors combine basic analyses of experimental data with model simulations and development and fitting of a factorial model set that includes a prominent model of change-point detection that has previously been shown to approximate Bayesian inference at a reduced computational cost and provide a good match to human prediction data (reduced Bayesian model). The authors present a number of findings, including a demonstration of key qualitative markers for Bayesian change-point detection, a tendency in humans to over-rely on recent observations, a lack of an inverse relationship between fit values of hazard rate and fit values of noise, support for a number of assumptions in the reduced Bayesian model, and a lack of evidence for reliance on memory systems beyond the extremely minimal requirements of that model.

      Strengths:

      The paper asks an important question and takes a number of useful steps toward answering it. In particular, the factorial model set constructed to examine a number of explicit assumptions in the models typically fit to change-point predictive inference task data was a very useful innovation, and in some cases showed clearly that assumptions in the model are necessary or at least better than the proposed alternatives. In particular, the paper develops a notion of memory capacity that allows for a continuum of models differing in their tradeoffs between computational cost and predictive precision. Another strength of the paper is that it relies on data that avoids sequential biases that can contaminate reported beliefs in more standard predictive inference tasks.

      Weaknesses:

      The primary weakness of the paper is that most of the definitive findings reported within it have already been reported elsewhere. That humans increase the influence of surprising outcomes indicative of change points, or to say this another way, decrease their reliance on prior information in such cases, has been fairly well established, as has the discovery that humans tend to overuse recent outcomes when making predictions. The most novel aspect of the paper, the exploration of reductions of the Bayesian ideal observer that rely on differing memory capacities, yielded results that are somewhat difficult to interpret, particularly because it is not clear that the task analyzed is diagnostic of the memory capacity term in the model, or if so, what the qualitative hallmarks of a high/low memory capacity model reduction might be.

    1. eLife Assessment

      This manuscript presents a useful mean-field model for a network of Hodgkin-Huxley neurons retaining the equations for ion exchange between the intracellular and extracellular space. The mean-field model derived in this work relies on approximations and heuristic arguments that, on the one hand, allow a closed-form derivation of the mean-field equations, but also raise questions about their justifications and the degree to which the results agree with experiments as well as direct numerical simulations. Therefore, the evidence for the utility of this approach is at present incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors derive a mean-field model for a network of Hodgkin-Huxley neurons retaining the equations for ion exchange between the intracellular and extracellular space.

      The mean-field model derived in this work relies on approximations and heuristic arguments that, on the one hand, allow a closed-form derivation of the mean-field equations, and on the other hand restrict its validity to a limited regime of activity corresponding to quasi-synchronous neuronal populations. Therefore, rather than an exact mean-field representation, the model provides a description of a mesoscopic population of connected neurons driven by ion exchange dynamics.

      Strengths:

      The idea of deriving a mean-field model that relates the slow-timescale biophysical mechanism of ion exchange and transportation in the brain to the fast-timescale electrical activities of large neuronal ensembles.

      Weaknesses:

      The idea underlying this work is not completely implemented in practice.

      The derived mean field model does not show a one-to-one correspondence with the neural network simulations, except in strongly synchronous regimes. The agreement with the in vitro experiment is hardly evident, both for the mean-field model and for the network model. The assumptions made to derive the closed-form equations of the mean-field model have not been justified by any biological reason, they just allow for the mathematical derivation. The final form of the mean-field equations does not clarify whether or not microscopic variables are used together with macroscopic variables in an inconsistent mixture.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aim to develop a neural mass model characterized by a few collective variables mimicking the dynamics of a network of Hodgkin - Huxley neurons encompassing ion-exchange mechanisms. They describe in detail the derivation of the mean-field model, then they compare experimental results obtained for the hippocampus of a mouse with the neural network simulations and the mean-field results. Furthermore, they report a bifurcation analysis of the developed model and simulation of a small network containing various coupled neural masses, somehow moving towards the simulation of an entire connectome.

      Strengths:

      The author attempts to develop a mean-field model for a globally coupled network of heterogeneous Hodgkin-Huxley neurons with an explicit ion exchange mechanism between the cell interior and exterior.

      Weaknesses:

      (1) It seems that the reduction methodology that is employed is not the most suitable one for the single-neuron model they are considering.<br /> (2) The authors' derivation of the neural mass model is based on several assumptions, and not all well justified.<br /> (3) The formulation of the mean-field derivation is unnecessarily complicated. It could be heavily simplified by following previously published approaches to derive biologically realistic neural masses.<br /> (4) The model seems to work only for highly synchronized situations and not for the standard asynchronous evolution usually observed in neural circuits.

      General Statements:

      The authors honestly declared the many limitations of their approach. It is assumed that the results of the mean-field are somehow inconsistent with the neural network simulations as expected.

      The authors suggest employing this model for the simulations on the whole connectome to follow seizure propagation, however, I believe that the Epileptor remains superior in this respect to this model. That indeed includes biophysical parameters but their correspondence with the ones employed in the network dynamics remains elusive, due to the many assumptions required to derive this mean-field model. Furthermore, it is more complicated than the Epileptor, I do not think that the present model will be largely employed by the community.

    1. eLife Assessment

      This important study uses diffusion magnetic resonance imaging to non-invasively map the white matter fibres connecting the zona incerta and cortex in humans. The authors present convincing evidence to indicate that these connections are organized along a rostro-caudal axis. The findings will be of interest to researchers interested in neuroanatomy and cortico-subcortical connectivity.

    2. Reviewer #1 (Public review):

      Summary:

      This is a study that used 7T diffusion MRI in subjects from a Human Connectome Project dataset to characterize the zona incerta, an area of gray matter whose involvement has been demonstrated in a broad range of behavioral and physiologic functions. The authors employ tractography to model white matter tracts that involve connections with the ZI and use clustering techniques to segment the ZI into distinct subregions based on similar patterns of connectivity. The authors report a rostral-caudal organization of the ZI's streamlines where rostrally-projecting tracts are rostrally-positioned in the ZI and caudally-projecting tracts are caudally-positioned in the ZI.

      Strengths:

      The paper presents robust findings that demonstrate subregions of the human ZI that appear to be structurally distinct using a combination of spectral clustering and diffusion map embedding methods. The results of this work can contribute to our understanding of the anatomy and structural connectivity of the ZI, allowing us to further explore its role as a neuromodulatory target for various neurological disorders.

      Weaknesses:

      There should be further discussion of the clustering methods employed and why they are appropriate for the pertinent data. Additionally, the limitations of analyzing solely the cortical connections of the zona incerta should be addressed, as anatomical studies of the ZI have shown significant involvement of the ZI in tracts projecting to deep brain regions.

    3. Reviewer #2 (Public review):

      Summary:

      Haast et al. investigated the organization of the zona incerta (ZI) in the human brain based on its structural connectivity to the neocortex. They found that the ZI is organized according to a primary rostro-caudal gradient, where the rostral ZI is more strongly connected to the prefrontal cortex and the caudal ZI to the sensorimotor cortex. They also found that the central region of the ZI is differently connected to the neocortex compared with the rostral and caudal regions, and could be important as a deep brain stimulation target for the treatment of essential tremors.

      Strengths:

      I think the overall quality of this work is great, and the results are presented in a very clear and organized manner. I particularly appreciate the effort that the authors put into validating the results using 7T and 3T data, as well as test-retest data.

      Weaknesses:

      That being said, I was left with a couple of concerns after reading the paper.

      (1) Although the authors discussed animal evidence for a dorsal-ventral organization of the ZI, I thought that the evidence they presented for it in this paper was not so convincing. In Figure S5, the second gradient (G2) shows a clear dorsoventral pattern, but this pattern seems to primarily separate the ZI and H fields rather than show an internal topology of the ZI. This is more likely the case given that there are two bands (superior and inferior) of high G2 values surrounding a single band (middle) of low G2 values. The evidence for the rostrocaudal gradient, on the other hand, is quite convincing.

      (2) HCP data is still too advanced for clinical translation. Although 3T is becoming more and more prevalent for presurgical planning, the HCP 3T dataset is acquired with a voxel size of 1.25mm, which is a far higher resolution than the typical clinical scan. It would be very useful for clinical readers to see what individual subject replicability looks like if the data were acquired at the more typical voxel size of 2mm. This could be achieved by replicating the analysis on a downsampled version of the HCP data that more closely resembles clinical data. This is understandably a large undertaking, so it could be left to future validation work.

  2. Jan 2025
    1. eLife Assessment

      This important work proposes a neural network model of interactions between the prefrontal cortex and basal ganglia to implement adaptive resource allocation in working memory, where the gating strategies for storage are adjusted by reinforcement learning. Numerical simulations provide convincing evidence for the superiority of the model in improving effective capacity, optimizing resource management, and reducing error rates, as well as for its human-like performance. This work will be of broad interest to computational and cognitive neuroscientists, and may also interest machine-learning researchers who seek to develop brain-inspired machine-learning algorithms for memory.

    2. Reviewer #1 (Public review):

      Summary:

      In this research, Soni and Frank investigate the network mechanisms underlying capacity limitations in working memory from a new perspective, with a focus on Visual Working Memory (VWM). The authors have advanced beyond the classical neural network model, which incorporates the prefrontal cortex and basal ganglia (PBWM), by introducing an adaptive chunking variant. This model is trained using a biologically-plausible, dopaminergic reinforcement learning framework. The adaptive chunking mechanism is particularly well-suited to the VWM tasks involving continuous stimuli and elegantly integrates the 'slot' and 'resource' theories of working memory constraints. The chunk-augmented PBWM operates as a slot-like system with resource-like limitations.

      Through numerical simulations under various conditions, Soni and Frank demonstrate the performance of the chunk-augmented PBWM model surpass the no-chunk control model. The improvements are evident in enhanced effective capacity, optimized resource management, and reduced error rates. The retention of these benefits, even with increased capacity allocation, suggests that working memory limitations are due to a combination of factors, including the efficient credit assignment that are learned flexibly through reinforcement learning. In essence, this work addresses fundamental questions related to a computational working memory limitation using a biologically-inspired neural network, thus has implications for conditions such as Parkinson's disease, ADHD and schizophrenia.

      Strengths:

      The integration of mechanistic flexibility, reconciling two theories for WM capacity into a single unified model, results in a neural network that is both more adaptive and human-like. Building on the PBWM framework ensures the robustness of the findings. The addition of the chunking mechanism tailors the original model for continuous visual stimuli. Chunk-stripe mechanisms contribute to the 'resource' aspect, while input-stripes contribute to the 'slot' aspect. This combined network architecture enables flexible and diverse computational functions, enhancing performance beyond that of the classical model.

      Moreover, unlike previous studies that design networks for specific task demands, the proposed network model can dynamically adapt to varying task demands by optimizing the chunking gating policy through RL.

      The implementation of a dopaminergic reinforcement learning protocol, as opposed to a hard-wired design, leads to the emergence of strategic gating mechanisms that enhance the network's computational flexibility and adaptability. These gating strategies are vital for VWM tasks and are developed in a manner consistent with ecological and evolutionary learning held by human. Further examination of how reward prediction error signals, both positive and negative, collaborate to refine gating strategies reveals the crucial role of reward feedback in fine-tuning the working memory computations and the model's behavior, aligning with the current neuroscientific understanding that reward matters.

      Assessing the impact of a healthy balance of dopaminergic RPE signals on information manipulation holds implications for patients with altered striatal dopaminergic signaling.

      Comments on revisions:

      In the revised version, the authors have thoroughly addressed all the questions raised in my previous review. They have clarified the model architecture, provided detailed explanations of the training process, and elaborated on the convergence of the optimization.

      Additionally, Reviewer 2 made a very constructive suggestion: Can related cognitive functions or phenomena emerge from the model? The newly added analysis and results highlighting the recency effect directly address this question and significantly strengthen the paper.

    3. Reviewer #2 (Public review):

      Summary:

      This paper utilizes a neural network model to investigate how the brain employs an adaptive chunking strategy to effectively enhance working memory capacity, which is a classical and significant question in cognitive neuroscience. By integrating perspectives from both the 'slot model' and 'limited resource models,' the authors adopted a neural network model encompassing the prefrontal cortex and basal ganglia, introduced an adaptive chunking strategy, and proposed a novel hybrid model. The study demonstrates that the brain can adaptively bind various visual stimuli into a single chunk based on the similarity of color features (a continuous variable) among items in visual working memory, thereby improving working memory efficiency. Additionally, it suggests that the limited capacity of working memory arises from the computational characteristics of the neural system, rather than anatomical constraints.

      Strengths:

      The neural network model utilized in this paper effectively integrates perspectives from both slot models and resource models (i.e., resource-like constraints within a slot-like system). This methodological innovation provides a better explanation for the limited capacity of working memory. By simulating the neural networks of the prefrontal cortex and basal ganglia, the model demonstrates how to optimize working memory storage and retrieval strategies through reinforcement learning (i.e., the efficient management of access to and from working memory). This biological simulation offers a novel perspective on human working memory and provides new explanations for the working memory difficulties observed in patients with Parkinson's disease and other disorders. Furthermore, the effectiveness of the model has been validated through computational simulation experiments, yielding reliable and robust predictions.

      Comments on revisions:

      The authors have already answered all my questions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important work proposes a neural network model of interactions between the prefrontal cortex and basal ganglia to implement adaptive resource allocation in working memory, where the gating strategies for storage are adjusted by reinforcement learning. Numerical simulations provide convincing evidence for the superiority of the model in improving effective capacity, optimizing resource management, and reducing error rates, as well as solid evidence for its human-like performance. The paper could be strengthened further by a more thorough comparison of model predictions with human behavior and by improved clarity in presentation. This work will be of broad interest to computational and cognitive neuroscientists, and may also interest machine-learning researchers who seek to develop brain-inspired machine-learning algorithms for memory.

      We thank the reviewers for their thorough and constructive comments, which have helped us clarify, augment and solidify our work. Regarding the suggestion to include a “more thorough comparison with with human behavior”, we believe this comment reflects one of the reviewer’s suggestion to compare with sequential order effects. We now include a new section with simulations showing that the network exhibits clear recency effects in accordance with the literature, and where such recency effects are known to be related to WM interference and not due to passive decay. Overall our work makes substantial contact with human behavioral patterns that have been documented in the human literature (and which as far as we know have not been jointly captured by any one model), such as the shape of the error distributions, including probability of recall and variable precision;  attraction to recently presented items,  sensitivity to reinforcement history, set-size dependent chunking, recency effects,  dopamine manipulation effects, as well of a range of human data linking capacity limitations to frontostriatal function. It also provides a theoretical proposal for the well established phenomenon of capacity limitations in humans, suggesting that they arise due to difficulty in WM management.

      Below we address each reviewer individually, responding to each comment and providing the relevant location in the paper that the changes and additions were made. Reviewer responses are included in blue/bold for clarity.  

      Public Reviews:

      Reviewer 1:

      Thank you for your comments. We appreciate your statements of the strengths of this paper and your suggestions to improve this paper.

      First, the method section appears somewhat challenging to follow. To enhance clarity, it might be beneficial to include a figure illustrating the overall model architecture. This visual aid could provide readers with a clearer understanding of the overall network model.

      Additionally, the structure depicted in Figure 2 could be potentially confusing. Notably, the absence of an arrow pointing from the thalamus to the PFC and the apparent presence of two separate pathways, one from sensory input to the PFC and another from sensory input to the BG and then to the thalamus, may lead to confusion. While I recognize that Figure 2 aims to explain network gating, there is room for improvement in presenting the content accurately.

      As suggested, we added a figure (new figure 2) illustrating the overall model architecture before expanding it to show the chunking circuitry. This figure also shows the projections from thalamus to PFC (we preserve the previous figure 2, now figure 3, as an example sequence of network gating decisions, in more abstract form to help facilitate a functional understanding of the sequence of events without too much clutter). We also made several other general clarifications to the methods sections to make it more transparent and easier to follow, as per your suggestions.   

      Still, for the method part, it would enhance clarity to explicitly differentiate between predesigned (fixed) components and trainable components. Specifically, does the supplementary material state that synaptic connection weights in striatal units (Go&NoGo) are trained using XCAL, while other components, such as those in the PFC and lateral inhibition, are not trained (I found some sentences in 'Limitations and Future Directions')?

      We have now explicitly specified learned and fixed components. We have further explained the role of XCAL and how striatal Go/NoGo weights are trained. We have also added clarification on how gating policies are learned via eligibility traces and synaptic tags.

      I'm not sure about the training process shown in Figure 8. It appears that the training may not have been completed, given that the blue line representing the chunk stripe is still ascending at the endpoint. The weights depicted in panel d) seem to correspond with those shown in panels b) and c), no? Then, how is the optimization process determined to be finished? Alternatively, could it be stated that these weight differences approach a certain value asymptotically? It would be better to clarify the convergence criteria of the optimization process.

      The training process has been clarified and we specify (in the last paragraph of the Base PBWM Model) how we determine when training is complete. We also can confirm that the network behavior has stabilized in learning even if the Go/NoGo weights continue to grow over time for the chunked layer (due to imperfect performance and reinforcement of the chunk gating strategy).

      Reviewer 2:

      Thank you for your comments. We appreciate your notes on the strengths of the paper and your suggestions to help improve the paper.

      The model employs a spiking neural network, which is relatively complex. Additionally, while this paper validates the effectiveness of chunking strategies used by the brain to enhance working memory efficiency through computational simulations, further comparison with related phenomena observed in cognitive neuroscience experiments on limited working memory capacity, such as the recency effect, is necessary to verify its generalizability.

      Thank you for proposing we add in more connections with human WM. Based on your specific recommendation, we have included the section “Network recapitulates human sequential effects in working memory.” where we discuss recency effects in human working memory and how our model recapitulates this effect. We have also made the connections to human data and human work more explicit throughout the manuscript (Figure 4c). As noted in response to the assessment, we believe our model does make contact with a wide variety of cognitive neuroscience data in human WM, such as the shape of the error distributions,  including probability of recall and variable precision;  attraction to recently presented items,  sensitivity to

      reinforcement history, set-size dependent chunking, recency effects, and dopamine manipulation effects, as well of a range of human data linking capacity limitations to frontostriatal function. It also provides a theoretical proposal for the well established phenomenon of capacity limitations in humans, suggesting that they arise due to difficulty in WM management.

      Recommendations For The Authors:

      Reviewer 1:

      I appreciate the authors' clear discussion of the limitations of this work in the section "Limitations and Future Directions". The development of a comprehensive model framework to overcome these constraints should require a separate paper, though, I am curious if the authors have attempted any experiments, such as using two identically designed chunking layers, that could partially support the assumptions presented in the paper.

      Expanding the number of chunking layers is a great future direction. We felt that it was most effective for this paper to begin with a minimal set up with proof of concept. We hypothesize that, given our results, a reinforcement learning algorithm would be able to learn to select the best level of abstraction (degree of chunking) in more continuous form, but would require more experience across a range of tasks to do so.

      I'm not sure whether it's appropriate that "Frontostriatal Chunking Gating..." precedes "Dopamine Balance is...", maybe it would be better to reverse the order thus avoiding the need to mention the role of dopamine before delving into the details. Additionally, including a summary at the end of the Introduction, outlining how the paper is organized, could provide readers with a clear roadmap of the forthcoming content.

      We appreciate this suggestion. After careful thought, we wanted to preserve the order because we felt it was important to make the direct connection between set size and stripe usage following the discussion on performance based on increasing stripes.  

      The authors could improve the overall polish of the paper. The equations in the Method section are somewhat confusing: Eq. (2) appears incorrect, as it lacks a weight w_i and n should presumably be in the denominator. For Eq. (3), the comma should be replaced with ']'... It would be advisable to cross-reference these equations with the original O'Reilly and Frank paper for consistency.

      Thank you for pointing out the errors in the method equations- those equations were indeed rendering incorrectly. We have fixed this problem.  

      Additionally, there are frequent instances of missing figure and reference citations (many '?'s), and it would be beneficial to maintain consistent citation formatting throughout the paper: sometimes citations are presented as "key/query coding (Traylor, Merullo, Frank, and Pavlick, 2024; see also Swan and Wyble, 2014)", while other times they are written as "function (O'Reilly & Frank, 2006)"...

      Lastly, there is an empty '3.1' section in the supplementary material that should be addressed.

      The citation issues were fixed. The supplementary information was cleaned and the missing section was removed. Thank you for mentioning these errors.  

      Reviewer 2:

      Thank you for the following recommendations and suggestions. We respond to each individual point based on the numbering system used in your review.  

      (1) This paper utilizes the experimental paradigm of visual working memory, in which different visual stimuli are sequentially loaded into the working memory system, and the accuracy of memory for these stimuli is calculated.

      The authors could further plot the memory accuracy curve as the number of items (N) increases, under both chunking and non-chunking strategies. This would allow for the examination of whether memory accuracy suddenly declines at a specific value of N (denoted as Nc), thereby determining the limited capacity of working memory within this experimental framework, which is about 4 different items or chunks. Additionally, it could be investigated whether the value of Nc is larger when the chunking strategy is applied.

      We have included an additional plot (Probability of Recall) as a supplemental figure to Figure 5 to explore the probability of recall as a function of set size for both chunking and no chunking models.  This plot shows that the chunking model increases probability of recall when set size exceeds allocated capacity (but that nevertheless both models show decreases in recall with set size, consistent with the literature).

      (2) The primacy effect or recency effect observed in the experiments and traditional working memory models, including the slot model and the limited resource model, should be examined to see if it also appears in this model.

      The literature on human working memory shows a prevalent recency effect (but not a primacy effect, which is thought to be due to episodic memory, and which is not included in our model). We have added a section showing that our model demonstrates clear recency effects.

      (3) The construction of the model and the single neuron dynamics involved need further refinement and optimization:

      Model Description: The details of the model construction in the paper need to be further elaborated to help other researchers better understand and apply the model in reproducing or extending research. Specifically:

      a) The construction details of different modules in the model (such as Input signal, BG, striatum, superficial PFC, deep PFC) and the projection relationships between different modules. Adding a diagram to illustrate the network construction would be beneficial.

      To aid in the understanding of the model construction and model components, we have included an additional figure (Figure 1: Base Model) that explains the key layers and components of the model.  We have also altered the overall model figures to show more clearly that the inputs project to both PFC and striatum, to highlight that information is temporarily represented in superficial PFC layers even before striatal gating, which is needed for storage after the input decays.

      We have expanded the methods and equations and we also provide a link to the model github for purposes of reproducibility and sharing.  

      A base model figure was added to specify key connections.  

      a) The numbers of excitatory and inhibitory neurons within different modules and the connections between neurons.

      We added clarification on the type of connections between layers (specifying which are fixed and learned). We have also added the size of layers in a new appendix section “Layer Sizes and Inner Mechanics”

      b) The dynamics of neurons in different modules need to be elaborated, including the description of the dynamic equations of variables (such as x) involved in single neuron equations.

      Single neuron dynamics are explained in equations 1-4. Equations 5-6 explain how activation travels between layers. The specific inhibitory dynamics in the chunking layer are elaborated in Figure 4. PBWM Model and Chunking Layer Details. The Appendix section “Neural model  implementational details” states the key equations, neural information and connectivity. Since there is a large corpus of background information underlying these models, we have linked the Emergent github and specifically the Computational Cognitive Neuroscience textbook which has a detailed description of all equations. For the sake of paper length and understability, we chose the most relevant equations that distinguish our model.  

      c) The selection of parameters in the model, especially those that significantly affect the model's performance.

      The appendix section hyperparameter search details some of the key parameters and why those values were chosen.  

      d) The model employs a sequential working memory paradigm, the forms of external stimuli involved in the encoding and recalling phases (including their mathematical expressions, durations, strengths, and other parameters) need to be elaborated further.

      We appreciate this comment. We have expanded the Appendix section “Continuous Stimuli” to include the details of stimuli presentation (including durations etc).  

      (4) The figures in the paper need optimization. For example, the size of the schematic diagram in Figure 2 needs to be enlarged, while the size of text such as "present stimulus 1, 2, recall stimulus 1" needs to be reduced. Additionally, the citation of figures in the main text needs to be standardized. For example, Figure 1b, Figure 1c, etc., are not cited in the main text.

      The task sequence figure (original Figure 2) has been modified and following your suggestions, text sizes have been modified.  

      (5) Section 3.1 in the appendix is missing.

      Supplemental section 3.1 is removed.

    1. eLife Assessment

      This report used a new double knockout mouse model to investigate the role of two neuropeptides, substance P and CGRPa, in pain signaling. There is convincing evidence that double knockout of these two molecules, both of which have historically been associated with pain, does not affect nociception or acute pain behaviors in males and females. This finding is fundamental, as it challenges the hypothesis that these peptides are essential for pain transmission, even when targeted together. This paper will be of interest to those interested in the neurobiology of pain and/or neuropeptide function.

    2. Reviewer #2 (Public review):

      Summary,

      The paper aimed to examine the effect of co-ablating Substance P and CGRPα peptides on pain using Tac1 and Calca double knockout (DKO) mice. The authors observed no significant changes in acute, inflammatory, and neuropathic pain. These results suggest that Substance P and CGRPα peptides do not play a major role in mediating pain in mice. Moreover, they reveal that the lack of behavioral phenotype cannot be explained by the redundancy between the two peptides, which are often co-expressed in the same neuron

      Strengths,

      The paper uses a straightforward approach to address a significant question in the field. The authors confirm the absence of Substance P and CGRPα peptides at the levels of DRG, spinal cord, and midbrain. Subsequently, they employ a comprehensive battery of behavioral tests to examine pain phenotypes, including acute, inflammatory, and neuropathic pain. Additionally, they evaluate neurogenic inflammation by measuring edema and extravasation, revealing no changes in DKO mice. The data are compelling, and the study's conclusions are well-supported by the results. The manuscript is succinct and well-presented.

    3. Reviewer #3 (Public review):

      In this study, the authors aimed to determine the role of a global double knockout (DKO) of substance P and CGRPα in modulating acute and chronic pain transmission. After successfully generating and validating the DKO mouse model, they conducted a series of behavioral pain assessments to evaluate the role of these neuropeptides in acute and chronic pain. Despite the well-established involvement of substance P and CGRPα in chronic pain, their findings revealed that the global loss of both neuropeptides did not affect the transmission of either acute or chronic pain.

      A major strength of the paper is that they validated their double knockout mouse model before using a comprehensive array of both acute and chronic pain tests to reach their conclusions. One minor weakness is that their n numbers for some of the studies conducted are low.

      The conclusions made by the authors are largely supported by their results and the authors successfully achieved their aim of investigating the role of simultaneous inhibition of substance P and CGRPα in pain transmission.

      This study offers valuable insights into our understanding of the pain pathways. Both Substance P and CGRPα neuropeptides and their receptors were considered key players in pain signaling due to their high expression in pain-responsive neurons. However, targeting these peptides in clinical trials has not been successful. By investigating the simultaneous inhibition of substance P and CGRPα through the generation of Tac1 and Calca double knockout (DKO) mice, the authors addressed an important gap in the field. Their comprehensive assessment of pain behaviors across a range of acute and chronic pain models revealed an unexpected outcome: the absence of both neuropeptides did not significantly alter pain responses. This finding is pivotal, as it challenges the hypothesis that these peptides are essential for pain transmission, even when targeted together.

      Comments on revisions:

      All my previous concerns have been addressed.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      MacDonald et al., investigated the consequence of double knockout of substance P and CGRPα on pain behaviors using a newly created mouse model. The investigators used two methods to confirm knockout of these neuropeptides: traditional immunolabeling and a neat in vitro assay where sensory neurons from either wildtype or double knock are co-cultured with substance P "sniffer cells", HEK cells stably expressing NKR1 (a substance P receptor), GCaMP6s and Gα15. It should be noted that functional assays confirming CGRPα knockout were not performed. Subsequently, the authors assayed double knockout mice (DKO) and wildtype (WT) mice in numerous behavioral assays using different pain models, including acute pain and itch stimuli, intraplanar injection of Complete Freund's Adjuvant, prostaglandin E2, capsaicin, AITC, oxaliplatin, as well as the spared nerve injury model. Surprisingly, the authors found that pain behaviors did not differ between DKO and WT mice in any of the behavioral assays or pain paradigms. Importantly, female and male mice were included in all analyses. These data are important and significant, as both substance P and CGRPα have been implicated in pain signaling, though the magnitude of the effect of a single knockout of either gene has been variable and/or small between studies.

      The conclusions of the study are largely supported by the data; however, additional experimental controls and analyses would strengthen the authors claims.

      We thank the reviewer for their insightful comments and have answered them below.

      (1) The authors note that single knockout models of either substance P or CGRPα have produced variable effects on pain behaviors that are study-dependent. Therefore, it would have strengthened the study if the authors included these single knockout strains in a side-by-side analysis (in at least some of the behavioral assays), as has been done in prior studies in the field when using double- or triple-knockout mouse models (for example, see PMID: 33771873). If in the authors hands, single knockouts of either peptide also show no significant differences in pain behaviors, then the finding that double knockouts also do not show significant differences would be less surprising.

      In our study, we found no phenotypic differences between WT and DKO mice, suggesting Substance P and CGRPα are largely dispensable for pain behavior. We agree that if we had we observed significant changes in behavior, it would have been interesting to examine the effects of knocking out each gene individually to determine which peptide is responsible for the phenotype. However, given the double deletion had no effect, we can predict that loss of each alone would have no or minor effects. In line with this, a more recent study that comprehensively phenotyped the Calca KO mouse found no deficits in a range of danger related behaviors (PMID: 34376756). Overall, as we are reporting negative data about the Double KO, we do not believe extensive studies of the single KOs is necessary to support the findings of our paper.

      (2) It is unclear why the authors only show functional validation of substance P knockout using "sniffer" cells, but not CGRPα. Inclusion of this experiment would have added an additional layer of rigor to the study.

      Imaging of CGRPα release is more challenging using the ‘sniffer’ approach because functional CGRP receptors require the expression of two genes: Calcrl (or Calcr) along with Ramp1. We now have succeeded in generating a new stable cell line expressing Calcrl and Ramp1, along with GCaMPs and human Galpha15 and include new data in the revised Figure 1F-H and Figure Supplement 1B. These cells respond robustly to CGRPalpha, but not to SP. In contrast, the existing SP cell line responds to SP but not CGRPalpha. Capsaicin evokes a strong response in these cells in co-culture with DRGs. This response is dramatically reduced in the DKO. This data therefore confirms our mice have a loss of CGRPalpha signaling as indicated by IHC.

      (3) The authors should be a bit more reserved in the claims made in the manuscript. The main claim of the study is that "CGRPα and substance P are not required for pain transmission." However, the authors also note that neuropeptides can have opposing effects that may produce a net effect of no change. In my view, the data presented show that double knockout of substance P and CGRPα do not affect somatic pain behaviors, but do not preclude a role for either of these molecules in pain signaling more generally. Indeed, the authors also note that these neuropeptides could be involved in nociceptor crosstalk with the immune or vascular systems to promote headache. The authors only assayed pain responses to glabrous skin stimulation. How the DKO mice would behave in orofacial pain assays, migraine assays, visceral pain assays, or bone/joint pain assays, for example, was not tested. I do not suggest the authors include these experiments, only that they address the limitations/weaknesses of their study more thoroughly.

      The reviewer makes an important point that we agree with. Our study assesses acute and chronic pain in peptide DKO mice lacking Substance P and CGRPα. Most of our data focuses on the hindpaw as pain in the paw is the gold-standard approach for phenotyping pain targets and numerous well-validated chronic pain models have been developed for this body site.  However, to extend the conclusions to other tissues, we did also look at visceral pain and GI distress using acetic acid and LiCl models (Figure 2J and Figure 2 supplement). We agree with the reviewer that given the utility of CGRP monoclonal antibodies, migraine experiments would be interesting for future studies using these mice, a point we highlight in the discussion. Bone/joint pain is also clearly important from a translational perspective, but outside the scope of the current study.

      (4) A more minor but important point, the authors do not describe the nature of the WT animals used. Are the littermates or a separately maintained colony of WT animals? The WT strain background should be included in the methods section.

      The WT strain are C57/BL6j from Jackson Lab. This has been added to the methods.

      Reviewer #2 (Public Review):

      Summary:

      The paper aimed to examine the effect of co-ablating Substance P and CGRPα peptides on pain using Tac1 and Calca double knockout (DKO) mice. The authors observed no significant changes in acute, inflammatory, and neuropathic pain. These results suggest that Substance P and CGRPα peptides do not play a major role in mediating pain in mice. Moreover, they reveal that the lack of behavioral phenotype cannot be explained by the redundancy between the two peptides, which are often co-expressed in the same neuron

      Strengths:

      The paper uses a straightforward approach to address a significant question in the field. The authors confirm the absence of Substance P and CGRPα peptides at the levels of DRG, spinal cord, and midbrain. Subsequently, they employ a comprehensive battery of behavioral tests to examine pain phenotypes, including acute, inflammatory, and neuropathic pain. Additionally, they evaluate neurogenic inflammation by measuring edema and extravasation, revealing no changes in DKO mice. The data are compelling, and the study's conclusions are well-supported by the results. The manuscript is succinct and well-presented.

      We thank the reviewer for their enthusiasm for the importance of our work.

      Reviewer #3 (Public Review):

      In this study, the authors were assessing the role of double global knockout of substance P and CGPRα on the transmission of acute and chronic pain. The authors first generated the double knockout (DKO) mice and validated their animal model. This is then followed by a series of acute and chronic pain assessments to evaluate if the global DKO of these neuropeptides are important in modulating acute and chronic pain behaviors. Authors found that these DKO mice Substance P and CGRPα are not required for the transmission of acute and chronic pain although both neuropeptides are strongly implicated in chronic pain. This study does provide more insight into the role of these neuropeptides on chronic pain processing, however, more work still needs to be done. (see the comments below).

      We thank the reviewer for their detailed and constructive feedback, and below outline the steps we have taken to answer their concerns.

      (1) In assessing the double KO (result #1), why are different regions of the brains shown for substance P and CGRPα (for example, midbrain for substance P and amygdala for CGRPα)? Since the authors mentioned that these peptides co-expressed in the brain (as in the introduction), shouldn't the same brain regions be shown for both IHC? It would be ideal if the authors could show both regions (midbrain and amygdala) in addition to the DRG and spinal cord for both peptides in their findings.<br /> In addition, since this is double KO, the authors should show more representative IHC-stained brain regions (spanning from the anterior to posterior).

      We could not co-stain both SP and CGRP in the same sections as the DKO mouse has endogenous GFP and RFP fluorescence, limiting us to one channel (far red). Specifically, we use a Calca KO that is a Cre:GRP knock-in/knockout (Chen et al 2018, PMID30344042) and Tac1 KO is a tagRFP knock-in/knockout (Wu et al 2018 PMID29485996). This is why we show different brain sections.

      (2) It is also unclear as to why the authors only assessed the loss of substance P signaling in the double KO mice. Shouldn't the same be done for CGRPα signaling? Either the authors assess this, or the authors have to provide clear explanations as to why only substance P signaling was assessed.

      As noted in our response to Reviewer 1, imaging of CGRP release is more challenging using the ‘sniffer’ approach because functional CGRP receptors require the expression of two genes: Calcrl (or Calcr) along with Ramp1. We have now generated this cell line and performed the experiment (see revised Figure 1 and Figure 1 Supplement).

      (3) Has these animal's naturalistic behavior been assessed after the double KO (food intake, sleep, locomotion for example)? I think this is important as changes to these naturalistic behaviors can affect pain processes or outcomes.

      We agree that assessment of naturalistic behavior including food intake, sleep and locomotion would be interesting to look at in DKO mice. However, our study is focused on acute and chronic pain behavior of these animals, and therefore a comprehensive phenotypic assessment of naturalistic home-cage behavior is outside the scope of our study.

      (4) Figure 2H: The authors acknowledge that there is a trend to decrease with capsaicin-evoked coping-like responses. However, a close look at the graph suggests that the lack of significance could be driven by 1 mouse. Have the authors run an outlier test? Alternatively, the authors should consider adding more n to these experiments to verify their conclusions.

      We were reluctant to add more animals searching for significance. Instead, we investigated the potential phenotype further by looking at cfos staining in the cord and found no differences (Figure 2, supplement 1). This result suggests loss of the two peptides does not grossly disrupt capsaicin evoked pain signal transmission between the nociceptor and post-synaptic dorsal neurons in the spinal cord.

      (5) Similarly, the values for WT in the evoked cFos activity (Figure 2- Suppl Figure 1) are pretty variable. Considering that the n number is low (n = 5), authors should consider adding more n.<br /> Also, since the n number is low in this experiment (eg. 5 vs 4), does this pass the normality test to run a parametric unpaired t-test? Either the authors increase their n numbers or run the appropriate statistical test.

      As described in the statistical tables, the Shapiro-Wilk test indicates these data do pass the normality test. Therefore, we retain the use of the unpaired t test, which demonstrates no significant difference between the groups.

      (6) In most of the results, authors ran a parametric test despite the low n number. Authors have to ensure that they are carrying out the appropriate statistical test for their dataset and n number.

      We now provide a table of the statistical results, which provides detailed information about all statistical tests performed in this study. For experiments where we make a single comparison between the two distributions (WT vs DKO), we have run a Shapiro-Wilk test. Where the data from both groups pass the normality test, we retain the use of the unpaired t test. Where the Shapiro-Wilk test indicates data from either group are unlikely to be normally distributed, we now use a Mann-Whitney U test to compare the groups, as this non-parametric test makes no assumptions about the underlying distribution.

      Many experiments involved two factors (genotype, and e.g. temperature, drug, time-point). These data were analyzed in the original submission using 2-WAY ANOVA or Repeated Measures 2-WAY ANOVA, followed by post-hoc Sidak’s tests to compute p values adjusted for multiple comparisons. Because there is no widely agreed non-parametric alternative to 2-WAY ANOVA for analyzing data with two factors and that enables us to account for multiple comparisons, we used 2-WAY ANOVA as is typically used in the field for these kinds of experiments. We reasoned sticking with the 2-WAY ANOVA was the best course of action based on information provided by the statistical software used for this study - https://www.graphpad.com/support/faq/with-two-way-anova-why-doesnt-prism-offer-a-nonparametric-alternative-test-for-normality-test-for-homogeneity-of-variances-test-for-outliers/

      We note that regardless of the test, our conclusion that there are no major changes in acute or chronic pain behaviors are clear and strongly supported.

      (7) Along the same line of comment with the previous, authors should increase the n number for DKO for staining (Figure 4) as n number is only 3 and there is variability in the cFos quantification in the ipsilateral side.

      We believe this is not necessary as the finding is clear that there is no difference.

      (8) Authors should provide references for statement made in Line 319-321 as authors mentioned that there are accumulating evidence indicating that secretion of these neuropeptides from nociceptor peripheral terminals modulates immune cells and the vasculature in diverse tissues.

      We now provide several references to primary papers and reviews supporting this statement.

      (9) Authors state that the sample size used was similar to those from previous studies, but no references were provided. Also, even though the sample sizes used were similar, I believe that the right statistic test should be used to analyze the data.

      We have now cited several classic studies phenotyping mouse KOs in pain in the methods that used similar sample sizes. As detailed above, we have taken the reviewer’s feedback on board and performed normality testing to ensure the correct statistical test is used for each experiment.

      (10) In the discussion, the authors noted that knocking out of a gene remains the strongest test of whether the molecule is essential for a biological phenomenon. At the same time, it was acknowledged that Substance P infusion into the spinal cord elicits pain, but it is analgesic in the brain. The authors might want to expand more on this discussion, including how we can selectively assess the role of these neuropeptides in areas of interest. For example, knocking out both Substance P and CGRPα in selected areas instead of the global KO since there are reported compensatory effects.

      This is highlighted in the closing paragraph: “Emerging approaches to image and manipulate these molecules (Girven et al., 2022; Kim et al., 2023), as well as advances in quantitating pain behaviors (Bohic et al., 2023; MacDonald and Chesler, 2023), may ultimately reveal the fundamental roles of neuropeptides in generating our experience of pain.” The Kim preprint (now published, and so the citation has been updated in the text) describes a method of inactivating neuropeptide transmission in select brain regions in a cell-type specific manner.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I do not have any major comments. My minor comments are as follows:

      (1) What was the control group for all behavioral studies? Was it WT from an independent colony or one of the littermates was used for generating controls?

      We used C57/Bl6 mice from Jax. This is now mentioned in methods.

      (2) In Fig. 2H, it seems that the effect will become significant if several mice are added.

      We are reluctant to add mice searching for significance. Sample sizes were determined before we collected the data blind.

      (3) There is no figure 3, but two figures 4.

      Thank you. This has been corrected.

      (4) Multiple typos in the legend for figure 4 (lines 234-254). Line 242 (& n=8 (3M, 3F)), line 243 (swelling and plasma), line 252 ((n=8 for) & n=6 for DKO (4M, 4F)).

      Thank you. This has been corrected.

      (5) In Figure 4 (lines 273-285), the contralateral side is mentioned in B but no images are shown.

      Thank you. We removed the mention.

      (6) Although ligand knockouts cannot be compared directly with receptor inhibition, the readers could benefit from discussing studies of receptor ablation and/or pharmacological inhibition.

      We do discuss the classic studies of receptor KO, and the clinical data on receptor blockers here –

      “However, selective antagonists of the Substance P receptor NKR1 failed to relieve chronic pain in human clinical trials (Hill, 2000). Although CGRP monoclonal antibodies and receptor blockers have proven effective for subsets of migraine patients, their usefulness for other types of pain in humans is unclear (De Matteis et al., 2020; Jin et al., 2018). In line with this, knockout mice deficient in Substance P, CGRPα or their receptors have been reported to display some pain deficits, but the analgesic effects are neither large nor consistent between studies (Cao et al., 1998; De Felipe et al., 1998; Guo et al., 2012; Salmon et al., 2001, 1999; Zimmer et al., 1998).” 

      Reviewer #3 (Recommendations For The Authors):

      Minor comments:

      (1) Figure 1E: What does chambers mean? Additionally, are the 12 chambers equally from the male and female samples (6 from male and 6 from female)?

      We have changed this to well. Each replicate is an individual well from 8 well chamber slide. In all these experiments, the wells are approximately evenly distributed by mouse, because from each mouse we cultured around 8 wells’ worth of DRGs.

      (2) Figure 1D: What does low and high mean in the Hargreaves test?

      These refer to a low and high active intensity of the radiant heat stimulus. Number is now described in the methods. 40 and 55 in the intensity units used by the instrument.

      (3) Figure 2-Suppl Figure 1: Authors should provide a bigger image of the image so that it is clearer to the readers.

      We think the image is of a reasonable size and comparable to the images used elsewhere in the paper.

      (4) Authors should consider labeling their supplementary figures in running numbers or combining supplementary figures together to avoid confusion. For example, Figure 2-Supplementary Figure 1 and Figure 2- Supplementary Figure 2 can be combined as just Supplementary Figure 2.

      We agree with the reviewer this would be clearer, but we have followed eLife’s convention for labelling and numbering supplements.

      (5) Figure 3 is mislabeled as Figure 4.

      Thank you. We have corrected this.

      (6) Only female mice were used in the CFA experiment, which does not go in line with the rest of the results which consist of both sexes.

      We have repeated the experiment with additional male mice. To be consistent with the von frey data, these were followed for 7 days, and so the figure now shows a 7 day time course.

      (7) Typo in line 243. The word "and" is subscript.

      Thank you. We have corrected this.

      (8) There is a typo in the legend for Figure 4 where E is labeled I, G is labeled as F, and J is labeled as J.

      Thank you. We have corrected this.

      (9) Authors should specify what "several weeks" means (Line 263).

      It means three weeks. We tested to 21 days. We will replace with three.

      (10) Authors should specify what "one day" means (Line 267). For example, how many days after the intraplantar oxaliplatin treatment? Also, authors should justify why that specific time point was selected or have a reference for it.

      This means one day after - 24 hours. Please see PMID: 33693512. Two references are provided in them methods.

      (11) Figure 4 legend: authors should again be specific on what "prolonged" entails (Line 277).

      We have replaced prolonged with 30 minutes brushing. Specifically, 3 x 10 min stim period, with 1 min rest between stim. It is in the methods.

      (12) In the methods section, authors state that both male and female mice were used for all experiments. However, only female mice were used in the CFA experiment (see minor comment #6). Authors should verify and correct this.

      This is correct. We only used female mice for one of the groups. We have since repeated with males, now included in the data.

      (13) Authors should be more specific in the methods section on how long the habituation per day, how many days and what were the mice habituation to (experimenter, room, chamber, etc)?

      As noted in the methods, mice are habituated for at least an hour to the chambers, and thus implicitly to the room. We do not perform explicit habituation to the investigator such as repeated handling.

      (14) Authors need to provide more information on the semi-automated procedure they are referring to in Line 397. Also, authors should also provide the criteria for cFos quantification (eg. Intensity, etc). If this has been published before, they should provide the reference.

      We have added this. We used the ‘Find maxima’ and ‘Analyze particles’ functions in FIJI, followed by a manual curation step.

      (15) How much acetone was applied and how was it applied to the paw? (Line 495)

      We used the same applicator (1ml syringe with a well at the top) to generate a droplet of acetone that was used for all mice. This has been added to methods.

      (16) Authors should specify the amount of capsaicin injected in μl (Line 500).

      20 ul. We have added this.

      (17) Authors should explain or reference why they are analyzing the 15 min interval between 5 and 20 minutes for injection (Line507-508).

      Acetic acid behaviour lasts around 30 mins in our hands. We chose the 15 minute interval because it reduces burdensome hand scoring time by 50% versus doing the whole 30 mins. We reasoned that in the first 5 mins post injection the animal behaviour may be contaminated by stress related to handling, injection and return to chamber. Thus, 5 and 20 minutes provided a sensible time-frame for scoring the behavior when it is at its peak.

      (18) Authors have to provide more information/explanation on how they decide on the conditioned taste aversion protocol. Like why they do 30 mins exposure to a single water-containing bottle followed 90 mins exposure to both bottles. If this has been published before, they should provide the reference.

      We read dozens of different published protocols in the literature, and piloted one that was something of an amalgam of some of them with various adaptations of convenience. Because it worked on our first attempt, we stuck to it. The advantage of the CTA assay is it is incredibly robust to changes in the specificities of the paradigm, evincing the clear survival value of learning to avoid tastes that make you sick.

      (19) Authors again should provide more detail in their methods section.

      a. Specify the time frame that they are assessing here (Line 533).

      This can be seen in the Figure. 0 to 120 mins. We have added it to the methods.

      b. How long were the mice allowed to recover post-SNI before mechanical allodynia was assessed (Line 545)?

      This is apparent in the figures. 2 days to 21 days. We have added it to the methods.

      c. How much of the oxaliplatin was injected into the mice?

      40 ug / 40 ul (see PMID:33693512)

      Editors note: Reviewers agreed that addressing the concerns about power, outliers, and statistics, as well as functional validation of CGRPα would raise the strength of evidence to compelling, and inclusion of comparison to single KO would raise it to exceptional.

      Should you choose to revise your manuscript, please check to ensure full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.

    1. eLife Assessment

      This important study provides convincing data from in vitro models and patient-derived samples to demonstrate how modulation of GSK3 activity can reprogram macrophages, revealing potential therapeutic applications in inflammatory diseases such as severe COVID-19. The study stands out for its clear and systematic presentation, strong experimental approach, and the relevance of its findings to the field of immunology.

    2. Reviewer #1 (Public review):

      The manuscript by Rios et al. investigates the potential of GSK3 inhibition to reprogram human macrophages, exploring its therapeutic implications in conditions like severe COVID-19. The authors present convincing evidence that GSK3 inhibition shifts macrophage phenotypes from pro-inflammatory to anti-inflammatory states, thus highlighting the GSK3-MAFB axis as a potential therapeutic target. Using both GM-CSF- and M-CSF-dependent monocyte-derived macrophages as model systems, the study provides extensive transcriptional, phenotypic, and functional characterizations of these reprogrammed cells. The authors further extend their findings to human alveolar macrophages derived from patient samples, demonstrating the clinical relevance of GSK3 inhibition in macrophage biology.

      The experimental design is sound, leveraging techniques such as RNA-seq, flow cytometry, and bioenergetic profiling to generate a comprehensive dataset. The study's integration of multiple model systems and human samples strengthens its impact and relevance. The findings not only offer insights into macrophage plasticity but also propose novel therapeutic strategies for macrophage reprogramming in inflammatory diseases.

      Strengths:

      (1) Robust Experimental Design: The use of both in vitro and ex vivo models adds depth to the findings, making the conclusions applicable to both experimental and clinical settings.<br /> (2) Thorough Data Analysis: The extensive use of RNA-seq and gene set enrichment analysis (GSEA) provides a clear transcriptional signature of the reprogrammed macrophages.<br /> (3) Relevance to Severe COVID-19: The study's focus on macrophage reprogramming in the context of severe COVID-19 adds clinical significance, especially given the relevance of macrophage-driven inflammation in this disease.

      Weaknesses:

      There are no significant weaknesses in the study.

    3. Reviewer #2 (Public review):

      Summary:

      The study by Rios and colleagues provides the scientific community with a compelling exploration of macrophage plasticity and its potential as a therapeutic target. By focusing on the GSK3-MAFB axis, the authors present a strong case for macrophage reprogramming as a strategy to combat inflammatory and fibrotic diseases, including severe COVID-19. Using a robust and comprehensive methodology, in this study it is conducted a broad transcriptomic and functional analyses and offers valuable mechanistic insights while highlighting its clinical relevance

      Strengths:

      Well performed and analyzed

      Weaknesses:

      Additional analyses, including mechanistic studies, would increase the value of the study.

    4. Author response:

      Regarding a future revised version, we plan to:

      • refer to the "MoMac-VERSE" study according to the original report.

      • modify incorrectly formatted references.

      • modify the text to acknowledge the heterogeneity and variability in the response of primary cells to the GSK3 inhibitor.

      • improve the explanation of the reanalysis of single cell RNAseq data in Figure 7 (ref. 47, GSE120833), and re-adapt the graphs of the scRNA-Seq data using different plot parameters (e.g., reduction = "umap.scvi") to provide a more friendly-user visualization including bona fide macrophage markers for each subpopulation.

      • include statistical analyses in each one of the figure legends

      • perform additional analyses (e.g., dose-response and kinetics of CHIR-99021 effects) and mechanistic studies (e.g., role of proteasome) to further dissect the re-programming ability of the GSK3/MAFB axis.

    1. eLife Assessment

      This study provides valuable insights into the behavioral, computational, and neural mechanisms of regime shift detection, by identifying distinct roles for the frontoparietal network and ventromedial prefrontal cortex in sensitivity to signal diagnosticity and transition probabilities, respectively. The findings are supported by solid evidence, including an innovative task design, robust behavioral modeling, and well-executed model-based fMRI analyses, though claims of neural selectivity would benefit from more rigorous statistical comparisons. Overall, this work advances our understanding of how humans adapt belief updating in dynamic environments and offers a framework for exploring biases in decision-making under uncertainty.

    2. Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      (1) The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      (2) The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well.

      (3) The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      Weaknesses:

      My major concern is about the correlational analysis in the section "Under- and overreactions are associated with selectivity and sensitivity of neural responses to system parameters", shown in Figures 5c and d (and similarly in Figure 6). The authors argue that a frontoparietal network selectively represents sensitivity to signal diagnosticity, while the vmPFC selectively represents transition probabilities. This claim is based on separate correlational analyses for red and blue across different brain areas. The authors interpret the finding of a significant correlation in one case (blue) and an insignificant correlation (red) as evidence of a difference in correlations (between blue and red) but don't test this directly. This has been referred to as the "interaction fallacy" (Niewenhuis et al., 2011; Makin & Orban de Xivry 2019). Not directly testing the difference in correlations (but only the differences to zero for each case) can lead to wrong conclusions. For example, in Figure 5c, the correlation for red is r = 0.32 (not significantly different from zero) and r = 0.48 (different from zero). However, the difference between the two is 0.1, and it is likely that this difference itself is not significant. From a statistical perspective, this corresponds to an interaction effect that has to be tested directly. It is my understanding that analyses in Figure 6 follow the same approach.

      Relevant literature on this point is:

      Nieuwenhuis, S, Forstmann, B & Wagenmakers, EJ (2011). Erroneous analyses of interactions in neuroscience: a problem of significance. Nat Neurosci 14, 1105-1107. https://doi.org/10.1038/nn.2886

      Makin TR, Orban de Xivry, JJ (2019). Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. eLife 8:e48175. https://doi.org/10.7554/eLife.48175

      There is also a blog post on simulation-based comparisons, which the authors could check out: https://garstats.wordpress.com/2017/03/01/comp2dcorr/

      I recommend that the authors carefully consider what approach works best for their purposes. It is sometimes recommended to directly compare correlations based on Monte-Carlo simulations (cf Makin & Orban). It might also be appropriate to run a regression with the dependent variable brain activity (Y) and predictors brain area (X) and the model-based term of interest (Z). In this case, they could include an interaction term in the model:

      Y = \beta_0 + \beta_1 \cdot X + \beta_2 \cdot Z + \beta_3 \cdot X \cdot Z

      The interaction term reflects if the relationship between the model term Z and brain activity Y is conditional on the brain area of interest X.

      Another potential concern is that some important details about the parameter estimation for the system-neglect model are missing. In the respective section in the methods, the authors mention a nonlinear regression using Matlab's "fitnlm" function, but it remains unclear how the model was parameterized exactly. In particular, what are the properties of this nonlinear function, and what are the assumptions about the subject's motor noise? I could imagine that by using the inbuild function, the assumption was that residuals are Gaussian and homoscedastic, but it is possible that the assumption of homoscedasticity is violated, and residuals are systematically larger around p=0.5 compared to p=0 and p=1.

      Relatedly, in the parameter recovery analyses, the authors assume different levels of motor noise. Are these values representative of empirical values?

      The main study is based on N=30 subjects, as are the two control studies. Since this work is about individual differences (in particular w.r.t. to neural representations of noise and transition probabilities in the frontoparietal network and the vmPFC), I'm wondering how robust the results are. Is it likely that the results would replicate with a larger number of subjects? Can the two control studies be leveraged to address this concern to some extent?

      It seems that the authors have not counterbalanced the colors and that subjects always reported the probability of the blue regime. If so, I'm wondering why this was not counterbalanced.

    3. Reviewer #2 (Public review):

      Summary:

      This paper focuses on understanding the behavioral and neural basis of regime shift detection, a common yet hard problem that people encounter in an uncertain world. Using a regime-shift task, the authors examined cognitive factors influencing belief updates by manipulating signal diagnosticity and environmental volatility. Behaviorally, they have found that people demonstrate both over and under-reaction to changes given different combinations of task parameters, which can be explained by a unified system-neglect account. Neurally, the authors have found that the vmPFC-striatum network represents current belief as well as belief revision unique to the regime detection task. Meanwhile, the frontoparietal network represents cognitive factors influencing regime detection i.e., the strength of the evidence in support of the regime shift and the intertemporal belief probability. The authors further link behavioral signatures of system neglect with neural signals and have found dissociable patterns, with the frontoparietal network representing sensitivity to signal diagnosticity when the observation is consistent with regime shift and vmPFC representing environmental volatility, respectively. Together, these results shed light on the neural basis of regime shift detection especially the neural correlates of bias in belief update that can be observed behaviorally.

      Strengths:

      (1) The regime-shift detection task offers a solid ground to examine regime-shift detection without the potential confounding impact of learning and reward. Relatedly, the system-neglect modeling framework provides a unified account for both over or under-reacting to environmental changes, allowing researchers to extract a single parameter reflecting people's sensitivity to changes in decision variables and making it desirable for neuroimaging analysis to locate corresponding neural signals.

      (2) The analysis for locating brain regions related to belief revision is solid. Within the current task, the authors look for brain regions whose activation covary with both current belief and belief change. Furthermore, the authors have ruled out the possibility of representing mere current belief or motor signal by comparing the current study results with two other studies. This set of analyses is very convincing.

      (3) The section on using neuroimaging findings (i.e., the frontoparietal network is sensitive to evidence that signals regime shift) to reveal nuances in behavioral data (i.e., belief revision is more sensitive to evidence consistent with change) is very intriguing. I like how the authors structure the flow of the results, offering this as an extra piece of behavioral findings instead of ad-hoc implanting that into the computational modeling.

      Weaknesses:

      (1) The authors have presented two sets of neuroimaging results, and it is unclear to me how to reason between these two sets of results, especially for the frontoparietal network. On one hand, the frontoparietal network represents belief revision but not variables influencing belief revision (i.e., signal diagnosticity and environmental volatility). On the other hand, when it comes to understanding individual differences in regime detection, the frontoparietal network is associated with sensitivity to change and consistent evidence strength. I understand that belief revision correlates with sensitivity to signals, but it can probably benefit from formally discussing and connecting these two sets of results in discussion. Relatedly, the whole section on behavioral vs. neural slope results was not sufficiently discussed and connected to the existing literature in the discussion section. For example, the authors could provide more context to reason through the finding that striatum (but not vmPFC) is not sensitive to volatility.

      (2) More details are needed for behavioral modeling under the system-neglect framework, particularly results on model comparison. I understand that this model has been validated in previous publications, but it is unclear to me whether it provides a superior model fit in the current dataset compared to other models (e.g., a model without \alpha or \beta). Relatedly, I wonder whether the final result section can be incorporated into modeling as well - i.e., the authors could test a variant of the model with two \betas depending on whether the observation is consistent with a regime shift and conduct model comparison.

    1. eLife Assessment

      This valuable study implicates a specific Wolbachia gene in driving the male-killing phenotype in a moth: This is a contribution to a growing body of literature from the authors in which they authors have nicely teased apart the loci responsible for male killing across diverse insects. The conclusions are supported by solid evidence.

    2. Reviewer #1 (Public review):

      Summary:

      Insects and their relatives are commonly infected with microbes that are transmitted from mothers to their offspring. A number of these microbes have independently evolved the ability to kill the sons of infected females very early in their development; this male killing strategy has evolved because males are transmission dead-ends for the microbe. A major question in the field has been to identify the genes that cause male killing and to understand how they work. This has been especially challenging because most male-killing microbes cannot be genetically manipulated. This study focuses on a male-killing bacterium called Wolbachia. Different Wolbachia strains kill male embryos in beetles, flies, moths, and other arthropods. This is remarkable because how sex is determined differs widely in these hosts. Two Wolbachia genes have been previously implicated in male-killing by Wolbachia: oscar (in moth male-killing) and wmk (in fly male-killing). The genomes of some male-killing Wolbachia contain both of these genes, so it is a challenge to disentangle the two.

      This paper provides strong evidence that oscar is responsible for male-killing in moths. Here, the authors study a strain of Wolbachia that kills males in a pest of tea, Homona magnanima. Overexpressing oscar, but not wmk, kills male moth embryos. This is because oscar interferes with masculinizer, the master gene that controls sex determination in moths and butterflies. Interfering with the masculinizer gene in this way leads the (male) embryo down a path of female development, which causes problems in regulating the expression of genes that are found on the sex chromosomes.

      Strengths:

      The authors use a broad number of approaches to implicate oscar, and to dissect its mechanism of male lethality. These approaches include: a) overexpressing oscar (and wmk) by injecting RNA into moth eggs, b) determining the sex of embryos by staining female sex chromosomes, c) determining the consequences of oscar expression by assaying sex-specific splice variants of doublesex, a key sex determination gene, and by quantifying gene expression and dosage of sex chromosomes, using RNASeq, and d) expressing oscar along with masculinizer from various moth and butterfly species, in a silkmoth cell line. This extends recently published studies implicating oscar in male-killing by Wolbachia in Ostrinia corn borer moths, although the Homona and Ostrinia oscar proteins are quite divergent. Combined with other studies, there is now broad support for oscar as the male-killing gene in moths and butterflies (i.e. order Lepidoptera). So an outstanding question is to understand the role of wmk. Is it the master male-killing gene in insects other than Lepidoptera and if so, how does it operate?

      Weaknesses:

      I found the transfection assays of oscar and masculinizer in the silkworm cell line (Figure 4) to be difficult to follow. There are also places in the text where more explanation would be helpful for non-experts.

    3. Reviewer #2 (Public review):

      Summary:

      Wolbachia are maternally transmitted bacteria that can manipulate host reproduction in various ways. Some Wolbachia induce male killing (MK), where the sons of infected mothers are killed during development. Several MK-associated genes have been identified in Homona magnanima, including Hm-oscar and wmk-1-4, but the mechanistic links between these Wolbachia genes and MK in the native host are still unclear.

      In this manuscript, Arai et al. show that Hm-oscar is the gene responsible for Wolbachia-induced MK in Homona magnanima. They provide evidence that Hm-Oscar functions through interactions with the sex determination system. They also found that Hm-Oscar disrupts sex determination in male embryos by inducing female-type dsx splicing and impairing dosage compensation. Additionally, Hm-Oscar suppresses the function of Masc. The manuscript is well-written and presents intriguing findings. The results support their conclusions regarding the diversity and commonality of MK mechanisms, contributing to our understanding of the mechanisms and evolutionary aspects of Wolbachia-induced MK.

      Comments on revisions:

      The authors have already addressed the reviewer's concerns.

    4. Reviewer #3 (Public review):

      Summary:

      Overall, this is a clearly written manuscript with nice hypothesis testing in a non-model organism that addresses the mechanism of Wolbachia-mediated male killing. The authors aim to determine how five previously identified male-killing genes (encoded in the prophage region of the wHm Wolbachia strain) impact the native host, Homona magnanima moths. This work builds on the authors' previous studies in which<br /> (1) they tested the impact of these same wHm genes via heterologous expression in Drosophila melanogaster<br /> (2) also examined the activity of other male-killing genes (e.g., from the wFur Wolbachia strain in its native host: Ostrinia furnacalis moths).

      Advances here include identifying which wHm gene most strongly recapitulates the male-killing phenotype in the native host (rather than in Drosophila), and the finding that the Hm-Oscar protein has the potential for male-killing in a diverse set of lepidopterans, as inferred by the cell-culture assays.

      Strengths:

      Strengths of the manuscript include the reverse genetics approaches to dissect the impact of specific male-killing loci, and use of a "masculinization" assay in Lepidopteran cell lines to determine the impact of interactions between specific masc and oscar homologs.

      Weaknesses:

      It is clear from Figure 1 that the combinations of wmk homologs do not cause male killing on their own here. While I largely agree with the author's conclusions that oscar is the primary MK factor in this system, I don't think we can yet rule out that wmk(s) may work synergistically or interactively with oscar in vivo. This might be worth a small note in the discussion. (eg at line 294 'indicating that wmk likely targets factors other than masc." - this could be downstream of the impacts of oscar; perhaps dependent on oscar-mediated impacts on masc first).

      Regarding the perceived male-bias in Figure 2a: I think readers might be interpreting "unhatched" as "total before hatching". You could eliminate ambiguity by perhaps splitting the bars into male and female, and then within a bar, coloring by hatched versus unhatched. But this is a minor point, and I think the updated text helps clarify this.

      The new Figure 4b looks to be largely redundant with the oscar information in Figure 1a.

      Updated statistical comparisons for the RNA-seq analysis are helpful. However these analyses are based on single libraries (albeit each a pool of many individuals), so this is still a weaker aspect of the manuscript.

      The new information on masc similarity is useful (Fig 4d) - if the authors could please include a heatmap legend for the colors, that would be helpful. Also, please avoid green and red in the same figure when key for interpretation.

      Figure 1A "helix-turn-helix" is misspelled. ("tern").

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Insects and their relatives are commonly infected with microbes that are transmitted from mothers to their offspring. A number of these microbes have independently evolved the ability to kill the sons of infected females very early in their development; this male killing strategy has evolved because males are transmission dead-ends for the microbe. A major question in the field has been to identify the genes that cause male killing and to understand how they work. This has been especially challenging because most male-killing microbes cannot be genetically manipulated. This study focuses on a male-killing bacterium called Wolbachia. Different Wolbachia strains kill male embryos in beetles, flies, moths, and other arthropods. This is remarkable because how sex is determined differs widely in these hosts. Two Wolbachia genes have been previously implicated in male-killing by Wolbachia: oscar (in moth male-killing) and wmk (in fly male-killing). The genomes of some male-killing Wolbachia contain both of these genes, so it is a challenge to disentangle the two.

      This paper provides strong evidence that oscar is responsible for male-killing in moths. Here, the authors study a strain of Wolbachia that kills males in a pest of tea, Homona magnanima. Overexpressing oscar, but not wmk, kills male moth embryos. This is because oscar interferes with masculinizer, the master gene that controls sex determination in moths and butterflies. Interfering with the masculinizer gene in this way leads the (male) embryo down a path of female development, which causes problems in regulating the expression of genes that are found on the sex chromosomes.

      We would like to thank you for evaluating our manuscript.

      Strengths:

      The authors use a broad number of approaches to implicate oscar, and to dissect its mechanism of male lethality. These approaches include:

      (1) Overexpressing oscar (and wmk) by injecting RNA into moth eggs.

      (2) Determining the sex of embryos by staining female sex chromosomes.

      (3) Determining the consequences of oscar expression by assaying sex-specific splice variants of doublesex, a key sex determination gene, and by quantifying gene expression and dosage of sex chromosomes, using RNASeq.

      (4) Expressing oscar along with masculinizer from various moth and butterfly species, in a silkmoth cell line.

      This extends recently published studies implicating oscar in male-killing by Wolbachia in Ostrinia corn borer moths, although the Homona and Ostrinia oscar proteins are quite divergent. Combined with other studies, there is now broad support for oscar as the male-killing gene in moths and butterflies (i.e. order Lepidoptera). So an outstanding question is to understand the role of wmk. Is it the master male-killing gene in insects other than Lepidoptera and if so, how does it operate?

      Thank you for your comments. Wolbachia strains often carry wmk genes, but as observed in this study, the homologs in Homona showed no apparent MK ability. These showed strong male lethality in D. melanogaster, but it is still unclear whether the genes are the master male-killing gene in Diptera. It is also possible that the genes show toxicities in other lepidopteran insects as well as in other insect taxa. Further functional validation assays in different insects are warranted to clarify whether wmk shows toxicity in different insect taxa. We have also discussed the functions of wmk in the Discussion section (lines 301-306).

      Weaknesses:

      I found the transfection assays of oscar and masculinizer in the silkworm cell line (Figure 4) to be difficult to follow. There are also places in the text where more explanation would be helpful for non-experts (see recommendations).

      Thank you for your suggestion. We have thoroughly revised the manuscript to address all the questions, comments and suggestions you raised in “recommendations”. In particular, we have revised the section on the transfection assays of Oscar and Masc in Bm-N4 cells (result section “Hm-oscar suppresses the masculinizing functions of lepidopteran masc genes” starts on line 214 and Fig. 4; materials and methods section ”Transfection assays and quantification of BmIMP<sup>M</sup>”, starts on line 483). We have also provided more detailed explanations for non-experts in some contexts (in response to your recommendation). We believe that the resulting revisions have significantly improved the quality and comprehensiveness of our manuscript.

      Reviewer #2 (Public review):

      Summary:

      Wolbachia are maternally transmitted bacteria that can manipulate host reproduction in various ways. Some Wolbachia induce male killing (MK), where the sons of infected mothers are killed during development. Several MK-associated genes have been identified in Homona magnanima, including Hm-oscar and wmk-1-4, but the mechanistic links between these Wolbachia genes and MK in the native host are still unclear.

      In this manuscript, Arai et al. show that Hm-oscar is the gene responsible for Wolbachia-induced MK in Homona magnanima. They provide evidence that Hm-Oscar functions through interactions with the sex determination system. They also found that Hm-Oscar disrupts sex determination in male embryos by inducing female-type dsx splicing and impairing dosage compensation. Additionally, Hm-Oscar suppresses the function of Masc. The manuscript is well-written and presents intriguing findings. The results support their conclusions regarding the diversity and commonality of MK mechanisms, contributing to our understanding of the mechanisms and evolutionary aspects of Wolbachia-induced MK.

      We would like to thank you for evaluating our manuscript.

      Strengths/weaknesses:

      (1) The authors found that transient overexpression of Hm-oscar, but not wmk-1-4, in Wolbachia-free H. magnanima embryos induces female-biased sex ratios. These results are striking and mirror the phenotype of the wHm-t infected line (WT12). However, Table 1 lists the "male ratio," while the text presents the "female ratio" with standard deviation. For consistency, the calculation term should be uniform, and the "ratio" should be listed for each replicate.

      We have revised the first results section (Hm-oscar induces female-biased sex ratios, starting from line 147) accordingly to maintain the consistency in the calculation term. In the revised manuscript, the 'male ratio' is now consistently used, in alignment with Fig. 1. In addition, we have included all sex ratio information (number of males and females) in the supplementary data file for transparency and clarity.

      (2) The error bars in Figure 3 are quite large, and the figure lacks statistical significance labels. The authors should perform statistical analysis to demonstrate that Hm-oscar-overexpressed male embryos have higher levels of Z-linked gene expression.

      The large error bar on each chromosome (Fig.3a-d) likely reflect the overall variation in expression levels across different transcripts. Accordingly, we have included statistical data for Figure 3 based on the Steel-Dwass test for expression levels. However, displaying statistical significance directly on the whisker plots would make the figure too cluttered due to the numerous combinations. Instead, we have provided all the statistical data in the supplementary data file. To further support the claim that Z-linked genes are more highly expressed in wHm-t-infected/Hb-Oscar-injected embryos, we have included the expression data for a Z-linked gene tpi, along with its statistical data in the revised manuscript (Fig. 3e, lines 210-212).

      (3) The authors demonstrated that Hm-Oscar suppresses the masculinizing functions of lepidopteran Masc in BmN-4 cells derived from the female ovaries of Bombyx mori. They should clarify why this cell line was chosen and its biological relevance. Additionally, they should explain the rationale for evaluating the expression levels of the male-specific BmIMP variant and whether it is equivalent to dsx.

      Thank you for your suggestion. We selected BmN-4 cell line because previous studies have established it as a reliable model for investigating the functions of lepidopteran masc genes and the interactions between masc and Oscar genes (Katsuma et al., 2019; 2022). In addition, BmIMP<sup>M</sup> is a male-specific regulator of the male-type dsx, making it an ideal target for assessing the 'maleness' induced by transfection of the masc gene in female-derived BmN-4 cells (Suzuki et al., 2010; Katsuma et al., 2015). We have included more detailed background information in the revised manuscript and have thoroughly revised this section (Hm-oscar suppresses the masculinizing functions of lepidopteran masc genes, starting at line 214) and Figure 4 for better clarity.

      (4) Although the authors show that Hm-oscar is involved in Wolbachia-induced MK in Homona magnanima and interacts with the sex determination system in lepidopteran insects, the precise molecular mechanism of Hm-oscar-induced MK remains unclear. Further studies are needed to elucidate how Hm-oscar regulates Homona magnanima genes to induce MK, though this may be beyond the scope of the current manuscript.

      Based on our findings and previous studies in Homona, Ostrinia and Bombyx (Arai et al., 2023a; Katsuma et al., 2023; Kiuchi et al., 2014), we hypothesize that the molecular mechanisms underlying _w_Hm-induced MK are likely linked to impaired dosage compensation caused by the inhibition of Masc function by the Hm-Oscar protein. While the precise mechanisms remain unclear, unbalanced Z-linked gene expression due to the impaired dosage compensation (i.e., 2-fold higher Z-linked gene expression compared to normal males) is known to be lethal for lepidopteran males (Kiuchi et al., 2014; Fukui et al., 2015; Visser et al., 2021). We have outlined this hypothesis in the Discussion section (lines 245-254).

      Reviewer #3 (Public review):

      Summary:

      Overall, this is a clearly written manuscript with nice hypothesis testing in a non-model organism that addresses the mechanism of Wolbachia-mediated male killing. The authors aim to determine how five previously identified male-killing genes (encoded in the prophage region of the wHm Wolbachia strain) impact the native host, Homona magnanima moths. This work builds on the authors' previous studies in which:

      (1) They tested the impact of these same wHm genes via heterologous expression in Drosophila melanogaster.

      (2) They examined the activity of other male-killing genes (e.g., from the wFur Wolbachia strain in its native host: Ostrinia furnacalis moths).

      Advances here include identifying which wHm gene most strongly recapitulates the male-killing phenotype in the native host (rather than in Drosophila), and the finding that the Hm-Oscar protein has the potential for male-killing in a diverse set of lepidopterans, as inferred by the cell-culture assays.

      Strengths:

      Strengths of the manuscript include the reverse genetics approaches to dissect the impact of specific male-killing loci, and the use of a "masculinization" assay in Lepidopteran cell lines to determine the impact of interactions between specific masc and oscar homologs.

      We would like to thank you for evaluating our manuscript.

      Weaknesses:

      My major comments are related to the lack of statistics for several experiments (and the data normalization process), and opportunities to make the manuscript more broadly accessible.

      Thank you for your suggestions. We have thoroughly revised the manuscript to provide clearer explanations for non-experts. In addition, we have included more detailed statistical data for Figure 3 and Figure 4 based on the Steel-Dwass tests. For Figure 3a-d, displaying statistical significance directly on the whisker plots would make the figure too cluttered due to the numerous combinations. Therefore, we have provided all the statistical data in the supplementary data file. To further support the claim that Z-linked genes are more highly expressed in w_Hm-t-infected/Hm-Oscar-injected embryos, we have included the expression data for a Z-linked gene _tpi, along with its statistical data in the revised manuscript (Fig.3e, lines 210-212). Regarding Figure 4, we have revised the Figure based on the reviewer’s suggestions, and provided more detailed information on how the expression data were analyzed (Transfection assays and quantification of BmIMP<sup>M</sup>, lines 495-520). We have also included more detailed background information on the assay system (Hm-oscar suppresses the masculinizing functions of lepidopteran masc genes, lines 215-237). Although we did not observe statistical significance based on the Steel-Dwass test, likely due to limited replicates, the observed changes in the IMP gene expression remain clearly evident.

      The manuscript I think would be much improved by providing more details regarding some of the genes and cross-lineage comparisons. I know some of this is reported in previous publications, but some summary and/or additional analysis would make this current manuscript much more approachable for a broader audience, and help guide readers to specific important findings. For example, a graphic and/or more detail on how the wmk/oscar homologs (within and across Wolbachia strains) differ (e.g., domains, percent divergence, etc) would be helpful for contextualizing some of the results. I recognize the authors discuss this in parts (e.g., lines 223-227), but it does require some bouncing between sections to follow. Similarly, the experiments presented in Figure 4 indicate that Hm-oscar has broad spectrum activity: how similar are the masc proteins from these various lepidopterans? Are they highly conserved? Rapidly evolving? Do the patterns of masc protein evolution provide any hints at how Oscar might be interacting with masc?

      Thank you for your valuable suggestion. To address this, we have included a visualization of the structural differences between the Oscar and wmk homologs in Figure 1a of the revised manuscript. In addition, we have included more detailed information for these genes and revised the introduction (lines 110-114; 124-137) and discussion (lines 255-266) to provide a clearer and more comprehensive overview. We have also described the similarity of the Masc proteins and Oscar proteins that we used, which is now reflected in the revised Figure 4b and 4d. More detailed information on these proteins is available in the supplementary data. Notably, Masc proteins exhibit high sequence variability with conserved domains (Figure 4d). Previous study identified the N-terminal region of Masc as crucial for the Oscar function (Katsuma et al., 2022). The wide spectrum of the actions of Hm-Oscar likely stems from these conserved structures of Masc, but the effects might have undergone evolutionary tuning through interactions with the native host as discussed in lines 293-294.

      It is clear from Figure 1 that the combinations of wmk homologs do not cause male killing on their own. Did the authors test if any of the wmk homologs impact the MK phenotype of oscar? It looks like a previous study tested this in wFur (noted in lines 250-252), but given that the authors also highlight the differences between the wFur-oscar and Hm-oscar proteins, this may be worth testing in this system. Related to this, what is the explanation for why there would be 4 copies of wmk in Hm?

      Thank you for your valuable suggestion. Unfortunately, we have not yet tested the effects of co-expression of wmk and Oscar. Due to a technical issue, the mixing of multiple constructs results in a reduced amount of mRNA (i.e. mixing wmk-3 and Hm-Oscar at the same concentration results in a 2-fold lower concentration in mRNA for both genes compared to mono-injected groups). In addition, we have previously tested injecting mRNA at the twofold higher concentration (i.e. 2 ug/ul mRNA), which resulted in very low hatchability regardless of the genes. Katsuma et al (2022) tested the effect of wmk on the sex determination system, but did not test the effect of co-injection/transfection of wmk and Oscar. Considering the results of this and previous studies (Katsuma et al., 2022; Arai et al., 2023), it is likely that the targets of the wmk and oscar genes are different (as discussed in lines 267-289). Co-injection of wmk and oscar may not produce additive effects. Nevertheless, we would like to test the results in future studies using the Drosophila system as well.

      As you point out, it is an interesting point that the moth-derived MK Wolbachia w_Hm-t encodes four _wmk genes, although they have no apparent effect on host survival. The exact functional relevance of these wmk homologs remains unclear. However, they may play a role in Wolbachia biology as transcriptional regulators, given that they encode HTH domains. Wolbachia generally encode several wmk homologs in their genome, regardless of whether they induce MK. This suggests that the functions of the wmk genes may be 'suppressed' in certain Wolbachia-host systems. The wmk and Hm-oscar genes are located within a prophage region, and some wmk genes are tandemly arrayed (as described in Arai et al., 2023). These wmk homologs may have increased in number by horizontal phage transfer, and the region containing wmk and adjacent sequences may act as a genomic island for virulence. So far, the function of wmk homologs has only been tested in D. melanogaster and H. magnanima, and further studies in other Wolbachia-host systems are highly warranted to test whether wmk exerts MK effects in other insect models. These points have been briefly discussed in the revised manuscript (lines 301-306; 318-320).

      Why are some of the broods male-biased (2/3) rather than ~50:50? (Lines 170-175, Figure 2a). For example, there is a strong male bias in un-hatched oscar-injected and naturally infected embryos, whereas the control uninfected embryos have normal 50:50 sex ratios. It is difficult to interpret the rate of male-killing given that the sex ratios of different sets of zygotes are quite variable.

      The observed male-biased sex ratios in unhatched embryos are due to the occurrence of MK during embryogenesis. In the unhatched groups, the skew towards males reflects that fact that the male embryos were targeted and killed by Wolbachia/Oscar, leading to a surplus of unhatched male embryos. Conversely, hatched individuals show a higher proportion of females because many of the males were eliminated during embryogenesis. Thus, the unhatched embryos are more male-biased, while the hatched individuals are more female-biased in the Hm-oscar/_w_Hm-t treated groups. We have revised the relevant section (Males are killed mainly at the embryonic stage, lines 179-186) and provided more detailed information to clarify this explanation.

      Figure 2b - it appears there are both male and female bands in the HmOsc male lane. I think this makes sense (likely a partial phenotype due to the nature of the overexpression approach), but this is worth highlighting, especially in the context of trying to understand how much of the MK phenotype might be recapitulated through these methods. Related, there is no negative control for this PCR.

      Thank you for your suggestion. As you noted, a faint dsx-M band is visible in the Hm-oscar treated group in Figure 2b. This is consistent with previous findings by Arai et al. (2023), which reported that male embryos with low-density w_Hm-t showed double bands of _dsx-M and dsx-F, similar to what we observed in this study. This information has been included in the revised manuscript in lines 196-198, as follows:

      “Notably, male embryos expressing Hm-oscar also exhibited weak male-type dsx splicing in addition to the female-type splicing, resembling the previously observed pattern in male embryos infected with low-titer _w_Hm-t (Arai et al., 2023a).”

      Also, we appreciate your comment regarding the missing of negative control. The figure has now been revised as we realised that the negative control lane had been lost during the preparation of the figure. We also included the relevant molecular marker information in both the figure legends and Figure 2b.

      It appears the RNA-seq analysis (Figure 3) is based on a single biological replicate for each condition. And, there are no statistical comparisons that support the conclusions of a shift in dosage compensation. Finally, it is unclear what exactly is new data here: the authors note "The expression data of the wHm-t-infected and non-infected groups were also calculated based on the transcriptome data included in Arai et al. (2023a)" - So, are the data in Figure 3c and 3d a re-print of previous data? The level of dosage compensation inferred by visually comparing the control conditions in 3b and 3d does not appear consistent. With only one biological replicate library per condition, what looks like a re-print of previous data, and no statistical comparisons, this is a weakly supported conclusion.

      Thank you for your suggestion. In this study, we generated the RNA-seq data for the Hm-oscar/GFP-injected groups, but did not sequence the w_Hm-t-infected/NSR lines. Instead, the previously generated RNA-seq data of _w_Hm-t-infected/NSR (Arai et al., 2023) were re-analyzed (rather than simply reprinted) to evaluate whether the expression patterns of _Hm-oscar-injected and w_Hm-t-infected groups are similar. We have revised the Results section (_Hm-oscar impairs dosage compensation in male embryos, lines 200-212), the Materials and methods section (Quantification of Z chromosome-linked genes, lines 454-456), and the figure legends to provide more precise information about this analysis.

      Although we did not perform replicates for the RNA-seq comparisons, it is important to note that each RNA-seq sample contains 50-60 male/female individuals. We believe the results are still robust and clearly indicative of the trends we observe. This was further supported by the quantification of Hmtpi gene expression, which we have visualized in Figure 3e (and lines 210-212). As you noted, the expression patterns in Figure 3b (GFP injected) and Figure 3d (NSR) are not completely identical. This discrepancy may be due to the differences between injection treatments and natural infections. Nevertheless, both treatments are consistent in showing that gene expressions on the Z chromosome (Chr01 and Chr15) are not upregulated.

      We have also added more detailed statistical data for Figure 3 based on the Steel-Dwass tests. For Figure 3a-d, however, showing the statistical significance directly on the whisker plots would create excessive clutter due to the numerous combinations of chromosomes. Instead, we have provided the full statistical data in the supplementary data file. Furthermore, to support/strengthen our conclusion that Z-linked genes are highly expressed in w_Hm-t-infected/_Hm-Oscar-injected embryos, we have included expression data for the Z-linked gene tpi, along with statistical data, in the revised manuscript (Fig. 3e, lines 210-212).

      In Figure 4: There are no statistics to support the conclusions presented here. Additionally, the data have gone through a normalization process, but it is difficult to follow exactly how this was done. The control conditions appear to always be normalized to 100 ("The expression levels of BmImpM in the Masc and Hm-Oscar/Oscar co-transfected cells were normalized by setting each Masc-transfected cell as 100"). I see two problems with this approach:

      (1) This has eliminated all of the natural variation in BmImpM expression, which is likely not always identical across cells/replicates.

      (2) How then was the percentage of BmImpM calculated for each of the experimental conditions? Was each replicate sample arbitrarily paired with a control sample? This can lead to very different outcomes depending on which samples are paired with each other. The most appropriate way to calculate the change between experimental and control would be to take the difference between every single sample (6 total, 3 control, 3 experimental) and the mean of the control group. The mean of the control can then be set at 100 as the authors like, but this also maintains the variability in the dataset and then eliminates the issue of arbitrary pairings. This approach would also then facilitate statistical comparisons which is currently missing.

      Thank you for your suggestion. As you pointed out in (1), the previous analysis did indeed eliminate the natural variation in BmIMP-M expression. In the revised manuscript and Figure 4, we have reanalyzed the data following your suggestion and have described the variation across replicates.

      For (2), the data shown in the previous manuscript were normalized to 100 for each Masc-treated group. In doing so, each replicate sample was arbitrarily paired with a control sample from the same cell lot to account for variations that might occur due to differences in cell lots. However, following your recommendation, we have revised the figure to set the average of the Hm-masc treated group to 100, rather than using arbitrary pairings. More detailed normalization procedures have been provided in the section 'Transfection assays and quantification of BmIMP' (lines 483-520). Additionally, we have provided more detailed background information on the assay system in lines 218-223. Although we did not observe statistical significance based on the Steel-Dwass test, likely due to the limited number of replicates, the differences in IMP gene expression between the Masc-treated and Masc&Hm-oscar-treated groups remain evident.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Line 38: change to: 'Wolbachia are maternally transmitted'.

      Revised accordingly (line 38).

      Line 69: remove 'seemingly'.

      Revised accordingly (line 69).

      Paragraph starting line 123: I don't think this is so clear to a reader who is not familiar with the work and system. It would be helpful to more clearly explain that candidate male-killing genes from Wolbachia that infect Homona were inserted into Drosophila melanogaster, and that their expression was then induced, with interesting patterns (and that it can be a bit difficult to interpret the transgenic expression of genes from a moth male-killer that are inserted into a fly). Also, the sentence about the combined action of cifA and cifB in Drosophila cytoplasmic incompatibility is also confusing to a non-expert. I would suggest removing it.

      Thank you for your suggestion. We have revised the paragraph (lines 124-139) to provide clearer background information, making it easier for non-experts to follow. We have also removed the sentence regarding the combined effect of cifA and cifB to improve the flow and overall clarity.

      Line 170: what is the explanation for the male-biased sex ratio instead of 50-50?

      The male-biased sex ratio occurs because MK happens during embryogenesis. Unhatched embryos include males that were killed by Wolbachia/Oscar, resulting in a higher proportion of unhatched male embryos. Conversely, the hatched individuals display a female bias, as most of the males were eliminated during embryogenesis. Thus, the unhatched embryos are more male-biased, while the hatched individuals are more female-biased in the Hm-oscar/_w_Hm-t treated groups. We have revised the section “Males are killed mainly at the embryonic stage” (lines 170-186) to include more detailed information explaining this phenomenon.

      Line 190: please explain what are the Z chromosomes in Bombyx and Homona and Lepidoptera in general (chromosomes 1 and 15?), as this is not so clear for a non-expert.

      Thank you for your suggestion. I have revised the section (lines 200-212) to include more precise background information about the chromosome constitutions in lines 202-204 as follows:

      “Unlike other lepidopteran species, Tortricidae, including H. magnanima, generally possess a large Z chromosome that is homologous to B. mori chromosomes 1 (Z) and 15 (autosome).”

      Line 222: please explain oscar diversity and classification in more detail, as this is not so clear for a non-expert.

      Thank you for your suggestion. We have revised the sentences to provide clearer background information on the diversity of oscar genes (lines 255-264).

      Figure 4: I found this difficult to follow. Why are there 2 rows (HmOscar and Oscar)? Does oscar here refer to oscar from Ostrinia? I am also a bit confused about the baseline control of Masc in these cell lines. If I understand Lepidoptera sex determination, then these cell lines are expressing high levels of female-specific piRNAs that suppress Masc. How specific are these piRNAs (i.e. do Bombyx piRNAs suppress Mascs from other Lepidoptera)? How much extra Masc will override endogenous piRNA? Information is lost by setting Masc expression to 100% in each separate comparison.

      Yes, the Oscar indicates the w_Fur-encoded _oscar (Oscar from Ostrinia) that was tested to compare function with the Homona-derived Hm-oscar gene. In addition, following the reviewer's suggestions, we have revised the figure and included more detailed information on how we adjusted the expressions in the M&M section.

      A previous study (Shoji et al., 2017, RNA 23:86–97) demonstrated that the Fem piRNA (29 bp) in Bombyx mori requires a 17 bp complementary sequence from its 5' region for its function. However, in species other than B. mori, no significant homology (i.e., over 17 bp matches) was found between the B. mori Fem piRNA and the masc genes analyzed in this study. Therefore, it is likely that the Fem piRNA expressed in BmN-4 cells is unable to suppress the masculinizing function driven by masc genes in other lepidopteran species. In addition, we did not quantify the levels of piRNA in this system, but the expression levels of masc are probably too high to be suppressed.

      Figure 4 legend: spelling of Spodoptera.

      Revised accordingly.

      Reviewer #2 (Recommendations for the authors):

      In Figure 2, what is the dsx splicing type for the hatched male in the Hm-oscar-injected group and the wHm-t infected line? Dsx-F or dsx-M?

      Thank you for your suggestion. Unfortunately, we have not tested splicing in the hatched male neonates (1st instar larvae), partly due to difficulties in obtaining sufficient material for RNA extraction. Based on the previous publication in the Ostrinia system, where Oscar-bearing w_Sca induces MK, the hatched males (ZZ) exhibit female type _dsx as observed in the male embryos (Herran et al., 2022). The hatched Homona males may show double bands for dsx-M and dsx-F as observed in this study.

      The size of the markers (in kilobase pairs) should be indicated in Figure 2.

      We have accordingly included the marker information in the revised Figure 2b and the figure legends.

      In Figure 3, could the authors identify which genes exhibit higher expression levels in the Hm-oscar-injected group and the wHm-t infected line? Could they provide hints for the possible mechanism of male-killing?

      In the RNA-seq data shown in Figure 3a-d, we observed that both the Hm-oscar-injected and w_Hm-infected groups generally exhibited upregulated expression of Z-linked genes. Rather than the upregulation or downregulation of a specific gene, we consider that global upregulation of Z-linked genes, caused by improper dosage compensation, is lethal for males. The Z chromosome contains various genes involved in key biological processes such as endocrine function and detoxification, and disruption of these processes may contribute to male lethality. Additionally, in this revised manuscript, we have provided more detailed information on the expression level of the Z-linked gene _tpi. We have also discussed the potential mechanisms of MK in the Discussion section (lines 245-254).

      The format of the references should be consistent. Gene and species names should be italicized.

      We have accordingly formatted.

      Reviewer #3 (Recommendations for the authors):

      The authors use the term "upstream" (e.g., Oscar suppressed the function of masculinizer, the upstream male sex determinant...), which was sometimes confusing. In many cases, it reads as though the masculinizer was upstream of oscar, but what I think the authors are trying to convey is that masculinizer is a primary sex-determining factor.

      Thank you for your suggestion. We have accordingly revised the term.

      Line 101: which insect is wFur from?

      It is from Ostrinia furnacalis - line 104 has been revised.

      Figure 1: it would be helpful to indicate the statistical results on the figure.

      Accordingly, we have added statistical data (binominal test) for Figure 1. The data for the Steel-Dwass test have been included in the supplementary data.

      Figure 2b: please label the ladder on the gel.

      Thank you for your suggestion. We have accordingly labeled the DNA ladder on the gel.

    1. eLife Assessment

      This study provides compelling data regarding the molecular characterization of a rare tumor type with few treatment options. This fundamental work significantly advances our mechanistic understanding of solitary fibrous tumours, a critical first step towards targeted precision medicine approaches. The results of this study will be of broad interest to cancer biologists and experimental oncologists.

    2. Joint Public Review:

      Solitary Fibrous Tumors (SFTs) are a rare malignancy defined by NAB2-STAT6 fusions. Because the molecular understanding of the disease is largely lacking, there are currently no targeted treatment approaches. Using primary tumor and adjacent normal tissue samples and cells inducibly expressing NAB2-STAT6, Hill et al. perform a detailed characterization of the transcriptomic and epigenomic NAB2-STAT6 SFT signatures. They identify enrichment or EGR1/NAB2 (but not STAT6) sites bound by the fusion protein and increased expression of EGR1 targets. Their studies indicate that NAB2-STAT6 fusion may direct the nuclear translocation of NAB2 and EGR1 proteins and potentially NAB1. Transcriptionally, NAB2-STAT6 SFTs most closely resemble neuroendocrine tumors.

      This pioneering study provides critical insight into the molecular pathogenesis of SFTs, pivotal for the future development of mechanistically informed treatment approaches. The study is rigorously executed and well-written. This new knowledge is an important addition to the field.

    1. eLife Assessment

      This valuable contribution combines high-resolution histology with magnetic resonance imaging in a novel way to study the organisation of the human amygdala. The main findings convincingly show the axes of microstructural organisation within the amygdala and how they map onto the functional organisation. Overall, the approach taken in this paper showcases the utility of combining multiple modalities at different spatial scales to help understand brain organisation.

    2. Reviewer #1 (Public review):

      The paper by Auer et. makes several contributions:

      (1) The study developed a novel approach to map the microstructural organization of the human amygdala by applying radiomics and dimensionality reduction techniques to high-resolution histological data from the BigBrain dataset.

      (2) The method identified two main axes of microstructural variation in the amygdala, which could be translated to in vivo 7 Tesla MRI data in individual subjects.

      (3) Functional connectivity analysis using resting-state fMRI suggests that microstructurally defined amygdala subregions had distinct patterns of functional connectivity to cortical networks, particularly the limbic, frontoparietal, and default mode networks.

      (4) Meta-analytic decoding was used to suggest that the superior amygdala subregion's connectivity is associated with autobiographical memory, while the inferior subregion was linked to emotional face processing.

      (5) Overall, the data-driven, multimodal approach provides an account of amygdala microstructure and possibly function that can be applied at the individual subject level, potentially advancing research on amygdala organization.

    3. Reviewer #2 (Public review):

      Summary:

      This study bridges a micro- to macroscale understanding of the organization of the amygdala. First, using a data-driven approach, the authors identify structural clusters in the human amygdala from high-resolution post-mortem histological data. Next, multimodal imaging data to identify structural subunits of the amygdala and the functional networks in which they are involved. This approach is exciting because it permits the identification of both structural amygdalar subunits, and their functional implications, in individual subjects. There are, however, some differences in the macro and microscale levels of organization that should be addressed.

      Strengths:

      The use of data-driven parcellation on a structure that is important for human emotion and cognition, and the combination of this with high-resolution individual imaging-based parcellation, is a powerful and exciting approach, addressing both the need for a template-level understanding of organization as well as a parcellation that is valid for individuals. The functional decoding of rsfMRI permits valuable insight into the functional role of structural subunits. Overall, the combination of micro to macro, structure, and function, and general organization to individual relevance is an impressive holistic approach to brain mapping.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      The paper by Auer et. makes several contributions: (1) The study developed a novel approach to map the microstructural organization of the human amygdala by applying radiomics and dimensionality reduction techniques to high-resolution histological data from the BigBrain dataset. (2) The method identified two main axes of microstructural variation in the amygdala, which could be translated to in vivo 7 Tesla MRI data in individual subjects. (3) Functional connectivity analysis using resting-state fMRI suggests that microstructurally defined amygdala subregions had distinct patterns of functional connectivity to cortical networks, particularly the limbic, frontoparietal, and default mode networks. (4) Meta-analytic decoding was used to suggest that the superior amygdala subregion's connectivity is associated with autobiographical memory, while the inferior subregion was linked to emotional face processing. (5) Overall, the data-driven, multimodal approach provides an account of amygdala microstructure and possibly function that can be applied at the individual subject level, potentially advancing research on amygdala organization.

      We thank the Reviewer for the positive comments and insightful evaluation of the work.

      (1.1) Although these are meritorious contributions there are some concerns that I will summarize below. The paper makes little-to-no contact with the monkey literature regarding the anatomy of amygdala subregions, their functionality, and their patterns of anatomical connectivity. This is surprising because such literature on non-human primates is a very important starting point for understanding the human amygdala. I recommend taking a careful look at the work by Helen Barbas, among others. There are too many papers to cite but a notable example is: Ghashghaei, H. T., Hilgetag, C. C., & Barbas, H. (2007). Sequence of information processing for emotions based on the anatomic dialogue between prefrontal cortex and amygdala. Neuroimage, 34(3), 905-923. The work of Amaral is also highly relevant.

      As suggested, we included the important work of Amaral et al. as well as Ghashghaei et al. highlighting its contribution to mapping the intricate anatomy and function of the amygdala in non-human primates. We comment on this in the Introduction of the manuscript. Please see P.3.

      “Early research on the amygdala in non-human primates has been instrumental in understanding its intricate structure, function and patterns of anatomical connectivity (Amaral and Price 1984; Ghashghaei et al. 2007). This foundational study highlights the amygdala’s different subdivisions, most notably the basomedial nucleus (BM), basolateral nucleus (BL), and central nucleus (Ce) (Amaral et al. 1992). Furthermore, this work describes a dense network between these subdivisions and the prefrontal cortex, most strongly found in the posterior orbitofrontal and anterior cingulate areas.”

      (1.2) Furthermore, the authors subscribe to a model with LB, CM, and SF sectors. How does the SF sector relate to monkey anatomy?

      The overall organization of these subregions is largely conserved between humans and monkeys, reflecting their evolutionary relationship. While the basic subregional organization is conserved, there are still some important structural and functional differences between human and monkey amygdalae. For example, the SF subregion, often described in humans includes parts of the cortical nuclei (VCo), anterior amygdaloid area (AAA), amygdalohippocampal transition area (AHi), amygdalopiriform transition area (APir) as well as the lateral olfactory tract (LOT). This remark was added in the Discussion, on P.12:

      “However, this region has been previously described as consisting of three main subdivisions: LB, CM, and SF, each composed of smaller subnuclei with distinct connectivity patterns and functions (Amunts et al. 2005; Ball et al. 2007; Bzdok et al. 2013; de Olmos and Heimer 1999). These subregions are largely conserved between humans and monkeys, reflecting their evolutionary relationship. However, there are still some considerable differences such as in the SF subregion, where its description in monkeys additionally contains the lateral olfactory tract (LOT) (De Olmos 1990).”

      (1.3) The authors use meta-analytical decoding via NeuroSynth. If the authors like those results of course they should keep them but the quality of coordinate reporting in the literature is insufficient to conclude much in the context of amygdala subregion function in my opinion. I believe the results reported are at most "somewhat suggestive".

      We agree with the Reviewer that use of data from NeuroSynth poses unique challenges, particularly relating to investigations of a small structure such as the amygdala. However, to clarify, these analyses decode the cortex-wide functional connectivity patterns of amygdala subregions and not activations within subregions defined by our microanatomical analyses. Additionally, comments from Reviewer 2 suggested expanding the NeuroSynth decoding to the contralateral hemisphere. As such, we decided to keep this analysis in the main manuscript but rephrase the interpretation of these findings in the Discussion to emphasize their exploratory nature on P.13:

      “Functional decoding of subregional functional connectivity patterns indicated possible dissociations in cognitive (e.g., memory) and affective (e.g., emotional face processing) functions of the amygdala, echoing previous accounts of this region’s involvement in associative processing of emotional stimuli. Notably, these findings link the functional connectivity profile of a subregion partially co-localizing with LB to emotional face processing. The LB subregion has been previously linked to associative processing related to the integration of sensory information (Bzdok et al. 2013; Ghods-Sharifi, St Onge, and Floresco 2009; Pessoa 2010; Winstanley et al. 2004; Boyer 2008), which is consistent with the association with visual emotional information processing identified in the present work.”

      (1.4) Another significant concern has to do with the results in Figure 3. The red and yellow clusters identified are quite distinct but the differences in functional connectivity are very modest. Figure 3C reveals very similar functional connectivity with the networks investigated. This is very surprising, and the authors should include a careful comparison with related findings in the literature. Overall, there is limited comparison between the observed results and those obtained via other methods. On a more pessimistic note, the results of Figure 3 seem to question the validity of the general approach.

      We agree with the Reviewer that we can indeed observe considerable overlap between functional connectivity profiles of amygdala subregions. The amygdala is a relatively small structure, leading to likely interconnectivity between its subregions (Bzdok et al. 2013) in addition to considering BOLD signal autocorrelation within this region. In addition, functional signals in the amygdala are affected by relatively lower signal-to-noise ratio (SNR), a limitation extending to temporobasal and mesiotemporal regions. Despite these challenges, our technique remained sensitive to detect subtle differences in connectivity patterns even in this small group of subjects in this restricted subcortical territory.

      In the revised manuscript, we further highlight these caveats in the Discussion (P.13):

      “Although these findings are promising, we also observe considerable overlap between functional connectivity networks of both our defined subregions. Indeed, the amygdala is a relatively small structure, leading to likely interconnectivity between its subregions and locally high signal autocorrelation. Functional connectivity and microstructure in the amygdala are certainly related, however previous work suggests they do not perfectly overlap (Bzdok et al. 2013). In addition, this region is affected by relatively low signal-to-noise ratio (SNR), as is observed in broader temporobasal and mesiotemporal territories.”

      (1.5) Some statements in the Discussion feel unwarranted. For example, "significant dissociation in functional connectivity to prefrontal structures that support self-referential, reward-related, and socio-affective processes." This feels way beyond what can be stated based on the analyses performed.

      We agree that this interpretation may reach beyond the analyses performed and reported findings. We have adjusted this portion of the text accordingly in our Discussion on functional connectivity findings (P.13):

      “Qualitatively, we found that the subregion defined by the highest 25% of U1 values mainly overlapped with what is commonly defined as the superficial and centromedial subregions, whereas the lowest 25% U1 values subregion overlapped mostly with the laterobasal division. Interestingly, CM and SF characterized subregions showed significantly stronger functional connectivity to prefrontal structures. This finding aligns with previous work demonstrating unique affiliations between the CM subregion and anterior cingulate and frontal cortices (Kapp, Supple, and Whalen 1994; Barbour et al. 2010), as well as between the SF subregion and the orbitofrontal cortex (Goossens et al. 2009; Caparelli et al. 2017; Pessoa 2010; Klein-Flügge et al. 2022).”

      Additionally, we have also edited our Discussion to ensure that our interpretations are grounded in the analyses conducted, while framing the findings as potential avenues for future work. Please see P.13.

      “Functional decoding of functional connectivity results indicated possible dissociations in cognitive (e.g., memory) and affective (e.g., emotional face processing) functions of the amygdala, echoing previous accounts of this region’s functional specialization and subregional segregation of associative processing of emotional stimuli.”

      Recommendations for the authors:

      (1.6) Figure 1 has panels A-I but only A-D are discussed in the caption. The orientation of the slices is not indicated which makes it very hard to follow for most readers.

      The subpanels are now referred to in the revised Results. We also added a notation on the orientation of the slices and described them accordingly in our Figure 1 description. (P.5-6):

      “(A) The amygdala was segmented from the 100-micron resolution BigBrain dataset using an existing subcortical parcellation (Xiao et al. 2019). Slice orientation is consistent across all panels in this figure.”

      (1.7) Some figure references in the text seem to be incorrect; please check that the text refers to the correct figure number and panel.

      We thank the Reviewer for pointing this out. We thoroughly revised the correspondence between figure panel labels and their referencing in the text.

      Reviewer #2:

      This study bridges a micro- to macroscale understanding of the organization of the amygdala. First, using a data-driven approach, the authors identify structural clusters in the human amygdala from high-resolution post-mortem histological data. Next, multimodal imaging data to identify structural subunits of the amygdala and the functional networks in which they are involved. This approach is exciting because it permits the identification of both structural amygdalar subunits, and their functional implications, in individual subjects. There are, however, some differences in the macro and microscale levels of organization that should be addressed.

      Strengths:

      The use of data-driven parcellation on a structure that is important for human emotion and cognition, and the combination of this with high-resolution individual imaging-based parcellation, is a powerful and exciting approach, addressing both the need for a template-level understanding of organization as well as a parcellation that is valid for individuals. The functional decoding of rsfMRI permits valuable insight into the functional role of structural subunits. Overall, the combination of micro to macro, structure, and function, and general organization to individual relevance is an impressive holistic approach to brain mapping.

      We thank the Reviewer for their constructive and helpful feedback on our work.

      Weaknesses:

      (2.1) UMAP 1, as calculated from the histological data, appears to correlate well across individuals, and decently with the MRI data, although the medial-lateral coordinate axis is an outlier. UMAP 2, on the other hand, does not appear to correlate well with imaging data or across individuals. This does pose a problem with the claim that this paper bridges micro- and macroscale parcellations. One might certainly expect, however, that different levels of organization might parcellate differently, but the authors should address this in the discussion and offer ways forward.

      Data driven methods hold several advantages for the quantitative extraction of signal from the underlying data in an observer-independent manner. However, these techniques are also sensitive to potential idiosyncrasies in the data. In the present work, our main analyses rely on the processing of a histological dataset (BigBrain) providing a unique opportunity for high-resolution analysis of amygdala histology and in vivo translation of findings leveraging ultra-high field MRI (n=10). However, both datasets are limited by their small sample size (n=1 for BigBrain and n=10 for MICA-PNI). As a result, we speculate that signal variations captured by U2 may be sensitive to artifacts or subject-specific sources of variance. Moving forward, this hypothesis could be assessed in future work via the analysis of larger histological and neuroimaging datasets to better track recurring features picked up by U2 or the association of these unique topographies with behavioural markers.

      As suggested, we included a section in our Discussion highlighting this shortcoming and the importance for larger datasets moving forward. Please see P.11-12.

      “However, it is important to note that both datasets analyzed in this work are limited by their small sample size (n=1 for BigBrain and n=10 for MICA-PNI). We speculate that the signal variations captured by U2 may be sensitive to artifacts or subject-specific sources of variance, potentially explaining why it was not consistent between subjects and modalities. Moving forward, this hypothesis could be assessed in future work via the analysis of larger histological and neuroimaging datasets to better track recurring features picked up by U2 or the association of these unique topographies with behavioural markers.”

      (2.1) It would be interesting to see functional decoding for the right amygdala. This could be included in the supplementary material. A discussion of differences in the results in the two hemispheres could be illuminating.

      In accordance with the Reviewer’s suggestion, we added Supplementary figure S2 exploring the decoding of connectivity profiles of the right amygdala stratified by its cytoarchitectural embedding with UMAP.

      Upon analysis, dissociation in functional connectivity patterns over the right amygdala were less evident, leading to overall similar functional decoding across the two clusters. We refer to this Supplementary Figure in our Discussion on P.13.

      “For the right amygdala, dissociation in functional connectivity patterns were more subtle, leading to overall similar functional decoding across the two clusters. (Figure S2)”

      (2.3) The authors acknowledge that this mapping matches some but not all subunits that have been previously described in the amygdala. It would be helpful to neuroanatomists if the authors could discuss these differences in more detail in the discussion, to identify how this mapping differs and what the implications of this are.

      In our work, we focus on mapping the three well characterized amygdala subregions, specifically the superficial (SF), centromedial (CM) and laterobasal (LB) subdivisions. Qualitative histological accounts have indeed delineated multiple subunits within these subregions which we now describe in the revised manuscript. Due to the lower resolution of in vivo MRI data used in this work relative to post mortem histology, we focused our analyses on larger subregions that could be more reliably mapped to native quantitative T1 spaces of each participant. We now overview this issue in the Discussion. Please see P.12.

      “Although qualitative histological accounts have indeed delineated multiple subunits within these general regions, the present work focuses on three subdivisions (Amunts et al. 2005) to account for resolution disparities when translating our findings to in vivo MRI data. The LB subdivision includes the basomedial nucleus (Bm), basolateral nucleus (BL), lateral nucleus (LA) and paralaminar nucleus (PL). Moving medially, the CM subdivision includes the central (Ce) and medial nuclei (Me), while the SF subdivision includes the anterior amygdaloid area (AAA), amygdalohippocampal transition area (AHi), amygdalopiriform transition area (APir), and ventral cortical nucleus (VCo) (Heimer et al. 1999). However, disagreement on the precise attribution of nuclei to broader subdivisions motivated our investigations of probabilistic subunits of the amygdala (Kedo et al. 2018). The development of new tools to segment amygdala subnuclei in vivo offers opens opportunities for future work to further validate our framework at the precision of these nuclei within subjects (Saygin et al. 2017).”

      (2.4) The acronym UMAP is not explained. A brief explanation and description would be useful to the reader.

      We moved the expanded acronym from the Methods to the first instance of the term UMAP in our paper, found in the Introduction. As suggested, we also added a sentence describing the technique. Please see P.6.

      “We then applied Uniform Manifold Approximation and Projection (UMAP), a non-linear dimensionality reduction technique that preserves the local and global structure of high-dimensional data by projecting it into a lower-dimensional space (Becht et al. 2018), to the resulting 20-feature matrix to derive a 2-dimensional embedding of amygdala cytoarchitecture (Figure 1D).”

    1. eLife Assessment

      This important study provides insights into the role of the cerebellum in fear conditioning, addressing a key gap in the literature. The evidence presented is solid overall, although the theoretical framing and clarity of the results can be improved and some concerns remain about the reliability of results based on small numbers of trials. This work will be of interest to both the extinction learning and cerebellar research communities.

    2. Reviewer #1 (Public review):

      Nio and colleagues address an important question about how the cerebellum and ventral tegmental area (VTA) contribute to the extinction learning of conditioned fear associations. This work tackles a critical gap in the existing literature and provides new insights into this question in humans through the use of high-field neuroimaging with robust methodology. The presented results are novel and will broadly interest both the extinction learning and cerebellar research communities. As such, this is a very timely and impactful manuscript. However, there are several points that could be addressed during the review process to strengthen the claims and enhance their value for readers and the broader scientific community.

      Points to Address:

      (1) Reward Interpretation and Skin Conductance Responses (SCR):<br /> A central premise of the manuscript is that 'unexpected omissions of expected aversive events' are rewarding, which plays a critical role in extinction learning. The authors also suggest that the cerebellum is involved in reward processing. However, it is unclear how this conclusion can be directly drawn from their task, which does not explicitly model 'reward.' Instead, the interpretation relies on SCR, which seems more indicative of association or prediction rather than reward per se. Is SCR a valid metric of reward experienced during the extinction of feared associations? Or could these findings reflect processes tied more closely to predictive learning? Please, discuss.

      (2) Reinforcement Agent and SCR Modeling:<br /> The modeling approach with the deep reinforcement agent treats SCR as a personalized expectation of shock for a given trial. However, this interpretation seems misaligned with participants' actual experience - they are aware of the shock but exhibit evolving responses to it over time. Why is this operationalization useful or valid? It would benefit the manuscript to provide a clearer justification for this approach.

      (3) Clarity and Visualization of Results:<br /> The results section is challenging to follow, and the visualization and quantification of findings could be significantly improved. Terms like 'trending' appear frequently - what does this mean, and is it worth reporting? Adding clear statistical quantifications alongside additional visualizations (e.g., bar or violin plots of group means within specific subregions within the cerebellum, or grouped mean activity in VTA and DCN) would enhance clarity and allow readers to better assess the distribution and systematicity of effects. Furthermore, the figures are overly complex and difficult to read due to the heavy use of abbreviations. Consider splitting figures by either phase of the experiment or regions, and move some details to the supplemental material for improved readability.

      (4) Theoretical Context for Paradigm Phases:<br /> The manuscript benefits from the comprehensive experimental paradigm, which includes multiple phases (acquisition, extinction, recall, reacquisition, re-extinction). This design has great potential for providing a more holistic view of conditioned fear learning and extinction. However, the manuscript lacks clarity on what insights can be drawn from these distinct phases. What theoretical framework underpins the different stages, and how should the results be interpreted in this context? At present, the findings seem like a display of similar patterns across phases without sufficient interpretation. Providing a stronger theoretical rationale and reorganizing the results by experimental phase could significantly improve readability and impact.

      (5) Cerebellum-VTA Connectivity Analysis:<br /> The authors argue that the cerebellum modulates VTA activity, yet they perform the PPI analysis in the reverse direction. Why does this make sense? In their DCM analysis, they found a bidirectional relationship (both cerebellum - VTA and VTA-cerebellum), yet the discussion focused on connectivity from the cerebellum to VTA. A more careful interpretation of the connectivity findings would be useful - especially the strong claims in the discussion on the cerebellum providing the reward signal to the VTA should be tempered.

    3. Reviewer #2 (Public review):

      Summary:

      Building upon the group's previous work, this study used a 3-day threat acquisition, extinction, recall, reextinction, and reacquisition paradigm with 7T imaging to probe the mechanism by which the cerebellum contributes to fear extinction learning. The authors hypothesise this may be via its connection to the VTA, a known modulator of fear extinction due to its role in reward processing. Using complementary analysis methods, the authors demonstrate that activity with the cerebellum, DNC, and VTA is modulated by predictions about the occurrence of the US, which shows regional specificity. They show trend-level evidence that there is increased functional connectivity between the cerebellum and VTA during all phases of the paradigm with unexpected omissions. They also present a DCM which indicates that the cerebellum could positively modulate VTA activity during extinction learning. This study adds to a growing literature supporting the role of the historically overlooked cerebellum in the control of emotions and suggests that an interaction between the cerebellum and VTA should be considered in the existing model of the fear extinction network.

      Strengths:

      The authors address their research question using a number of complementary methods, including parametric modulation by model-derived expectation parameters, PPI, and DCM, in a logical and easily understood way. I feel the authors provide a balanced interpretation of their findings, presenting numerous interpretations and offering insight with regard to reward vs attention or unsigned prediction errors and the directionality of the interaction they identify. The manuscript is a timely addition to growing literature highlighting the role of the cerebellum in fear conditioning, and emotion generation and regulation more generally.

      Weaknesses:

      Subjective and skin conductance responses do not completely support the success of the learning paradigm. For example, CS+/CS- differentiation in both domains persisted after extinction training. I do not feel that this negates the findings of this manuscript, though it raises questions about the parametric modulators used, and the interpretation of the neural mechanisms proposed if they do not strongly relate to updated subjective appraisals (the goal of extinction therapy). My interpretation of the manuscript suggests there are some key results based upon contrasts that have as few as three events; I am a little unsure about the power and reliability of these effects, though I await author clarification on this matter. There are a number of unaddressed deviations from the pre-registered protocol that I have asked the authors to elaborate upon.

    4. Author response:

      Reviewer 1:

      (1) Reward Interpretation and Skin Conductance Responses (SCR):

      The reviewer raises a valid point, as the model from which we derive prediction errors describes predictive learning—specifically, the occurrence of shock—without incorporating additional reward learning effects. SCRs are used to fit the model’s hyperparameters but do not directly measure reward; rather, they serve as a marker of arousal.

      In our paradigm, SCRs are measured during CS presentation and primarily reflect predictive learning, as they are closely linked to contingency awareness. The association between estimated prediction errors during unexpected US omissions and reward remains reliant on existing literature.

      In the revised manuscript, we will further elaborate on these points to clarify the distinction between predictive learning and direct reward processing, while contextualizing our findings within the broader literature on reward signaling and fear extinction.

      (2) Reinforcement Agent and SCR Modeling:

      Notably, we do not use SCR as a personalized expectation measure due to its limited reliability at the individual level; instead, the model's hyperparameters are fitted on the entire SCR dataset, yielding per-trial prediction and prediction error estimates for each CS sequence rather than for individual participants.

      (3) Clarity and Visualization of Results:

      We recognize that the presentation of our results can be improved and will take steps to enhance figure clarity, also ensuring that trend-level results are clearly distinguished.

      (4) Theoretical Context for Paradigm Phases:

      Regarding the differences across experimental phases, we recognize the theoretical significance of these distinctions. However, our primary focus is on identifying commonalities in unexpected US omission responses across phases rather than emphasizing phase-specific differences. Nevertheless, we will provide a brief clarification on phase differences to enhance the manuscript’s interpretability.

      (5) Cerebellum-VTA Connectivity Analysis:

      Furthermore, we acknowledge that our conclusion regarding the modulation of the dopaminergic system by the cerebellum should be framed more cautiously. We will temper our claims to better reflect the bidirectional and potentially indirect nature of cerebellum-VTA interactions. Additionally, we plan to include PPI results using a cerebellar seed showing the VTA, potentially in the supplementary material.

      Reviewer 2:

      (1) Success of extinction learning based on Self-reports and SCRs?

      The reviewer points to a problem, which is inherent to extinction learning: The initial fear association is not erased, but merely inhibited, and is prone to return. Although the recall phase follows the extinction phase, we did not expect a complete inhibition of the conditioned response; instead, spontaneous recovery is expected. In fact, the spontaneous recovery observed in the recall phase provided us with an additional opportunity to investigate unexpected US omissions, which was our primary focus.

      (2) Concerns on reliability of event-based contrasts using three events:

      Regarding concerns about the reliability of analyses based on three events, we believe that the consistency of our parametric modulation analysis— which incorporates all events— combined with the three-event analysis results, provides further support for the observed patterns. We are currently discussing ways of additional analysis for further verification of the reliability of using three events.

      (3) Deviations from preregistration:

      Finally, we will carefully review all deviations from our preregistration to ensure transparency. Any methodological or analytical changes will be explicitly addressed in the revised manuscript.

    1. eLife Assessment

      This research addresses an important and timely topic in cancer treatment, as the authors present a novel computational tool, 'retriever,' which has the potential to revolutionize personalized cancer treatment strategies by predicting effective drug combinations for triple-negative breast cancer. The strength of the evidence presented is solid, as evidenced by the systematic testing of 152 drug response profiles and 11,476 drug combinations.

    2. Reviewer #1 (Public review):

      Summary:

      Identifying drugs that target specific disease phenotypes remains a persistent challenge. Many current methods are only applicable to well-characterized small molecules, such as those with known structures. In contrast, methods based on transcriptional responses offer broader applicability because they do not require prior information about small molecules. Additionally, they can be rapidly applied to new small molecules. One of the most promising strategies involves the use of "drug response signatures"-specific sets of genes whose differential expression can serve as markers for the response to a small molecule. By comparing drug response signatures with expression profiles characteristic of a disease, it is possible to identify drugs that modulate the disease profile, indicating a potential therapeutic connection.

      This study aims to prioritize potential drug candidates and to forecast novel drug combinations that may be effective in treating triple-negative breast cancer (TNBC). Large consortia, such as the LINCS-L1000 project, offer transcriptional signatures across various time points after exposing numerous cell lines to hundreds of compounds at different concentrations. While this data is highly valuable, its direct applicability to pathophysiological contexts is constrained by the challenges in extracting consistent drug response profiles from these extensive datasets. The authors use their method to create drug response profiles for three different TNBC cell lines from LINCS.

      To create a more precise, cancer-specific disease profile, the authors highlight the use of single-cell RNA sequencing (scRNA-seq) data. They focus on TNBC epithelial cells collected from 26 diseased individuals compared to epithelial cells collected from 10 healthy volunteers. The authors are further leveraging drug response data to develop inhibitor combinations.

      Strengths:

      The authors of this study contribute to an ongoing effort to develop automated, robust approaches that leverage gene expression similarities across various cell lines and different treatment regimens, aiming to predict drug response signatures more accurately. The authors are trying to address the gap that remains in computational methods for inferring drug responses at the cell subpopulation level.

      Weaknesses:

      One weakness is that the authors do not compare their method to previous studies. The authors develop a drug response profile by summarizing the time points, concentrations, and cell lines. The computational challenge of creating a single gene list that represents the transcriptional response to a drug across different cell lines and treatment protocols has been previously addressed. The Prototype Ranked List (PRL) procedure, developed by Iorio and co-authors (PNAS, 2010, doi:10.1073/pnas.1000138107), uses a hierarchical majority-voting scheme to rank genes. This method generates a list of genes that are consistently overexpressed or downregulated across individual conditions, which then hold top positions in the PRL. The PRL methodology was used by Aissa and co-authors (Nature Comm 2021, doi:10.1038/s41467-021-21884-z) to analyze drug effects on selective cell populations using scRNA-seq datasets. They combined PRL with Gene Set Enrichment Analysis (GSEA), a method that compares a ranked list of genes like PRL against a specific set of genes of interest. GSEA calculates a Normalized Enrichment Score (NES), which indicates how well the genes of interest are represented among the top genes in the PRL. Compared to the method described in the current manuscript, the PRL method allows for the identification of both upregulated and downregulated transcriptional signatures relevant to the drug's effects. It also gives equal weight to each cell line's contribution to the drug's overall response signature.

      The authors performed experimental validation of the top two identified drugs; however, the effect was modest. In addition, the effect on TNBC cell lines was cell-line specific as the identified drugs were effective against BT20, whose transcriptional signatures from LINCS were used for drug identification, but not against the other two cell lines analyzed. An incorrect choice of genes for the signature may result in capturing similarities tied to experimental conditions (e.g., the same cell line) rather than the drug's actual effects. This reflects the challenges faced by drug response signature methods in both selecting the appropriate subset of genes that make up the signature and in managing the multiple expression profiles generated by treating different cell lines with the same drug.

    3. Reviewer #2 (Public review):

      Summary:

      In their study, Osorio and colleagues present 'retriever,' an innovative computational tool designed to extract disease-specific transcriptional drug response profiles from the LINCS-L1000 project. This tool has been effectively applied to TNBC, leveraging single-cell RNA sequencing data to predict drug combinations that may effectively target the disease. The public review highlights the significant integration of extensive pharmacological data with high-resolution transcriptomic information, which enhances the potential for personalized therapeutic applications.

      Strengths:

      A key finding of the study is the prediction and validation of the drug combination QL-XII-47 and GSK-690693 for the treatment of TNBC. The methodology employed is robust, with a clear pathway from data analysis to experimental confirmation.

      Weaknesses:

      However, several issues need to be addressed. The predictive accuracy of 'retriever' is contingent upon the quality and comprehensiveness of the LINCS-L1000 and single-cell datasets utilized, which is an important caveat as these datasets may not fully capture the heterogeneity of patient responses to treatment. While the in vitro validation of the drug combinations is promising, further in vivo studies and clinical trials are necessary to establish their efficacy and safety. The applicability of these findings to other cancer types also warrants additional investigation. Expanding the application of 'retriever' to a broader range of cancer types and integrating it with clinical data will be crucial for realizing its potential in personalized medicine. Furthermore, as the study primarily focuses on kinase inhibitors, it remains to be seen how well these findings translate to other drug classes.

    1. eLife Assessment

      This important study provides new evidence on the role of norepinephrine (NE) release in the hippocampus in response to environmental transitions (event boundaries), providing a potential link between NE signaling and the segmentation of episodic memories. The work is solid, employing innovative techniques such as fiber photometry with the GRAB-NE sensor for NE measurement, the analysis of public electrophysiology hippocampal datasets, and well-controlled experiments. While further analysis could strengthen some claims, this work offers insights into memory, neuromodulation, and hippocampal function.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates the role of norepinephrine (NE) signaling in the hippocampus during event transitions, positing that NE release serves as a mechanism for marking event boundaries to facilitate episodic memory segmentation. The authors use a genetically encoded fluorescent indicator (GRABNE) to measure NE release with high temporal precision, correlating these signals with changes in hippocampal firing dynamics. By integrating photometry data, behavioral analyses, and analysis of neuronal activity from publicly available datasets, the work addresses fundamental questions about the relationship between neuromodulatory signals and memory encoding.

      Strengths:

      The authors present a compelling framework linking NE signaling to event boundaries, offering insight into how episodic memory segmentation may occur in the brain. The writing is clear and the data are well-described. It is easy to follow. The pharmacological validation of the GRABNE sensor enhances confidence in their NE measurements, an important methodological strength given the potential limitations of fluorescence-based neuromodulatory indicators. Moreover, the authors carefully disentangle NE signals from confounding behavioral variables, providing evidence that NE release is time-locked to event boundaries rather than movement or arousal-related behaviors. This level of analytical rigor strengthens their central claims. Additionally, the observation of NE signal dynamics that decay over hundreds of seconds is interesting, as it aligns with timescales relevant to hippocampal plasticity reported in prior literature.

      Weaknesses:

      While the authors establish correlations between NE signaling and hippocampal activity changes, causation is not demonstrated. Future studies using perturbative approaches (e.g., optogenetic or chemogenetic manipulation of NE release) would be necessary to establish a direct causal link. Furthermore, the persistence of NE signals over long timescales (hundreds of seconds) raises questions about its role in encoding rapid event boundaries, as it is unclear how this prolonged signaling might affect memory encoding for closely spaced events. The lack of a discussion about how NE dynamics would operate in such scenarios weakens the proposed framework. Finally, while the authors acknowledge the limitations of the GRABNE sensor, a more detailed exploration of how sensor sensitivity might influence their results would enhance the interpretation of their findings.

    3. Reviewer #2 (Public review):

      Summary:

      The authors use a genetically encoded fluorescent sensor, GRABNE, to measure NE dynamics in the dorsal hippocampus of mice in response to multiple behavioral manipulations. A non-linear model and regression were used to quantitatively assess the contribution of multiple behavioral covariates to changes in NE signaling, with the result that NE signal dynamics were best predicted by time from event transitions, with the signal exponentially decaying over a period of seconds to minutes after transitions. Event transitions were implemented as a transfer from a home cage to a novel arena, a transfer to a familiar linear track, and the introduction of novel objects. Additional experiments showed that spatial context transitions dominate NE signaling over novel object presentations, and experience accelerates the decay of the NE signal after spatial context transitions. Correspondingly, the hippocampal CA1 spatial code takes minutes to stabilize after context transition in both novel and familiar spaces.

      Strengths:

      A strength of the study is the use of the NE sensor with sub-second resolution, non-linear modeling, and regression to identify the prominent variable of interest as time from event transition, and multiple behavioral controls. The use of multiple behavioral designs to investigate the effect of familiarity, experience, and interaction of spatial context transitions and novel object introduction is a strength. Relating the dynamics of NE signal decay to the rate of CA1 spatial code changes is also a strength.

      Weaknesses:

      A minor weakness is that the concept of an event boundary needs to be more broadly discussed. The manuscript uses event transitions such as spatial context changes and novel object introduction to implement an event boundary. However, especially in episodic memory studies in humans, event structure and boundaries have also been shown to occur through the automatic segmentation of experiences into discrete events (Baldassano et al., Neuron, 2017; Radvansky and Zacks, Curr. Opi. Behav. Sci, 2017). The rodent experiments in the current manuscript explicitly introduce event boundaries through changes in context or objects, which can potentially be conflated with novelty. A discussion of these differences, and whether NE can also have a role in event boundary transitions based on automatic segmentation of experiences, will add to the impact of the manuscript.

    4. Reviewer #3 (Public review):

      Summary

      The manuscript investigates the role of norepinephrine (NE) release in the rodent hippocampus during event boundaries, such as transitions between spatial contexts and the introduction of novel objects. It also explores how NE release is altered by experience and how novelty drives the amplitude and decay times of extracellular NE. By utilizing the GRABNE sensor for sub-second resolution measurement of NE, the authors demonstrate that NE release is driven primarily by the time elapsed since an event boundary and is independent of behaviors like movement or reward. The study further explores how hippocampal neural representations are altered over time, showing that these representations stabilize shortly after event transitions, potentially linking NE release to episodic memory encoding.

      Strengths

      Overall, the work provides novel insights into the interplay between NE signaling and hippocampal activity and presents an intriguing hypothesis on how NE release may help push hippocampal activity into unique attractor states to encode novel experiences. The experiments are well-controlled, and the analysis is well-presented, with a detailed and engaging discussion that points towards several new and exciting research directions. The use of several behavioral paradigms to demonstrate the strongest predictor of NE release is a strength, as well as the regression analysis to disambiguate the contribution of other correlated variables. The suggestion that NE does not select ensembles for subsequent replay is also an interesting result.

      Weaknesses

      The authors have not convincingly established a link between hippocampal neural activity and NE release, showing qualitative rather than quantitative correlations. Therefore, at this stage, the role of NE on hippocampal function remains speculative.

      Another general concern is that the smoothing/ kinetics of the sensor impacts the regression analyses. Most of the other variables, such as speed, acceleration, and even reward time points are highly dynamic and it is possible that the limitations of the sensor decorrelate the signal from (potentially) causal variables, therefore resulting in the time since the event start having the most explanatory power for most of the analyses.

      More broadly, the figure legends should be expanded to better describe error bounds, mean vs median, sample sizes, and averaging choices for plots.

      There are also some concerns regarding the nearest neighbor analysis and the reported differences in the rate of reactivations after familiar and novel environments, as outlined below.

      (1) Lines 657-658. How far away in time can the top three nearest neighbor time points be? Must they lie in different trials, or can they also be within the same trial? Is there a systematic difference in the average time lags for the nearest neighbors over the course of the session?

      The authors should only allow nearest neighbors to be in a different lap because systematic changes in behavior (running fast initially) might force earlier time bins in a certain location to match with a different trial, while the later time bins can be from within the same trial if the mice are moving slower and stay in the same spatial bin location longer. The authors should also provide information on how the averaging is performed because there are several axes of variability - spatial bin locations, sessions, different environments, and animals.

      (2) Figure 8: These results are very interesting. However, I am confused by the differences between Figure 8B and D because the significant reactivations in A and C are very similar. The 1-minute and 10-minute windows seem somewhat arbitrary and prone to noise and variability. Perhaps the authors should fit a slope for the curves on A and C and compare whether the slope/ intercept are significantly different between the novel and familiar environments.

    1. eLife Assessment

      This study examined the important question of how neurons code temporal information across the hippocampus, dorsal striatum, and orbitofrontal cortex. Using a behavioral task in the rat that requires discrimination between short and long time intervals, the authors conclude that time intervals are represented in all three regions and that synchronized activity of time-coding cells across the brain regions is coordinated by theta rhythms. However, several weaknesses are noted, and in its current form, the study provides incomplete evidence for understanding how temporal information is processed and coordinated throughout these brain networks.

    2. Reviewer #1 (Public review):

      Summary:

      It is known that neuronal activity in several brain regions encodes interval time. However, how interval time is encoded across distributed brain regions remains unclear. By simultaneously recording neuronal activity from the hippocampal CA1, dorsal striatum, and orbitofrontal cortex during a temporal bisection task, the authors showed that elapsed time during the interval period is encoded similarly across these regions and that the neuronal activity of time cells across these regions tends to be synchronized within 100 ms. Using Bayesian decoding, they demonstrated that the interval time decoded from the firing activity of time cells in these regions correlated with the rats' decisions and that the times decoded from the neuronal activity of different brain regions were correlated. The sound experiments and analyses support most of the main conclusions of this paper.

      Strengths:

      They used a temporal bisection task in which the effects of time and distance can be dissociated. The test trials successfully revealed the relationship between the interval time estimated by Bayesian decoding and the animal's judgment of long versus short interval times. Simultaneous recording of neuronal activity from the hippocampal CA1, dorsal striatum, and orbitofrontal cortex, which is technically challenging, allowed comparison of interval time encoding across brain regions and the degree of synchrony between neurons from different brain regions.

      Weaknesses:

      Some analyses were not explained in detail, making it difficult to assess whether their results support the authors' conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors examined how neural activity related to temporal information is distributed and coordinated throughout the hippocampus, dorsal striatum, and orbitofrontal cortex. Rats were forced to run for fixed time intervals on a treadmill and make a decision based on whether the interval was long (10s) or short (5s). Under these conditions time cells were observed across all examined brain regions. The primary finding of the authors is that synchronized activity between time cells across brain regions is entrained into the theta cycle. This observation is used to support the central claim that the sharing of temporal information is mediated by the theta oscillation.

      Strengths:

      By simultaneously recording several brain regions in an interval discrimination task, the authors provide a valuable dataset for understanding how temporal information is processed and distributed throughout relevant networks.

      Weaknesses:

      Several methodological concerns should be addressed and a more focused analysis should be performed to strengthen the central claims of this work.

      Major Concerns

      (1) The restriction to only use time cells to understand temporal information processing. Other mechanisms of encoding time, like population clocks and ramping, have been characterized in the striatum and frontal cortex, and these dynamics might contain more temporal information than the subset of cells that meet the statistical criteria for being a time cell. Furthermore, time cells in the OFC, and DS in particular, appear to be heavily biased towards the beginning of treadmill running. This raises the question of whether temporal information can be encoded by neurons other than time cells in these two regions.

      (2) The results of the Bayesian decoding analysis should be expanded on. In particular, the performance of each decoder above the chance level is not quantified. Comparing the performance of decoders trained on all cells to the performance of decoders trained on time cells alone would partially address the question of whether or not time cells are the only cells that can encode temporal information in the DS and OFC.

      (3) The decoding results for the test trials appear different from the results in the authors' previous publication (Shimbo et. al., 2021). There, differences in decoded time between the selected-long and selected-short trials emerged after 5s, the duration of the short trials. This was to be expected given the following two reasons. First, from the task design, it is unclear that the animal can distinguish trial types (long, short, or test) until after the first 5 seconds of treadmill running, making it logical for differences in decoded time to emerge only after this point. Second, time cell activity was identical in the first 5s of the long and short trials as shown in Figure 2A. Here, however, the differences in decoded time during the selected-long and selected-short test trials emerge within the first 2s of treadmill running. Could the authors explain this discrepancy?

      Furthermore, in Figure 6B, at 3 seconds of running time, the decoded time for selected-long and selected-short trials shows a difference of nearly 2 seconds, with no further increase as running time progresses. In contrast, at 2 seconds of running time, there is no significant difference in decoded time for DS and OFC, while CA1 shows a slight increase in the decoded time for selected-long trials. This pattern suggests a sudden jump in the encoded time for selected-long trials between 2 and 3 seconds. However, without explicitly showing the raw data, it is difficult to interpret this result and other results from the decoding analysis.

      Minor Concerns

      (1) It is not clear how the Bayes decoder was trained. Does the training data come entirely from the long trials?

      (2) For Figure 5D, even if only one of two neurons in a pair has its spike rate modulated by theta, wouldn't the expectation be that synchronous spike events between these two neurons would be modulated by theta as well? This analysis might benefit from shuffling methods to determine if the mean resultant length of synchronous spike events is larger than the chance level.

      (3) In Figure 5A, the authors suggest that 'the synchronization of time cells was modulated by theta oscillation.' However, it is unclear whether the population exhibits a preferred theta phase or the phase preference only occurs at the individual cell level. If there is no preference on the population level, how would the authors interpret this result?

    4. Reviewer #3 (Public review):

      Summary:

      This study examines neural activity recorded simultaneously in the hippocampus, dorsal striatum, and orbitofrontal cortex as rats performed an interval timing task. The analyses primarily focus on the activity of "time cells" which are neurons that fire at specific moments during the intervals. In this experiment, the intervals consist of periods when animals are running on a treadmill before selecting the arm associated with the interval duration. The results show that the theta oscillations induced by this running behavior were observed across the three regions and that this strong oscillation modulated the activity of neurons across regions. While these findings are correlative in nature, they provide an important characterization of activity patterns across regions during complex behavior. However, more research is needed to determine whether these activity patterns specifically contribute to temporal coding.

      Strengths:

      (1) Overall, the paper is very well written. Although I have specific concerns about the review of the relevant literature and the interpretation of the results (see below), I do want to commend the authors for their efforts toward presenting this complex work in an accessible manner.

      (2) The study is well designed and the quality of the electrophysiological data collected from multiple brain regions in such a challenging behavioral experiment is impressive. This work is a technical tour de force.

      (3) The analyses are very thorough, statistically rigorous, and clearly explained and visualized. The authors provide a thoughtful mixture of example data (at the level of individual cells or animals) and aggregated data (at the group or session level) to properly explain and quantify the activity patterns of interest.