21,586 Matching Annotations
  1. Last 7 days
    1. eLife assessment

      This study presents valuable findings that examine both how Down syndrome (DS)-related physiological, behavioral, and phenotypic traits track across time, as well as how chronic treatment with green tea extracts 25 enriched in epigallocatechin-3-gallate (GTE-EGCG), administered in drinking water spanning prenatal through 5 months of age, impacts these measures in wild-type and Ts65Dn mice. The strength of the evidence is solid, due to high variability across measures, perhaps in part attributable to a failure to include sex as a factor for measures known to be sexually dimorphic. This study is of interest to scientists interested in Down Syndrome and its treatment, as well as scientists who study disorders that impact multiple organ systems.

    1. eLife assessment

      This important work provides a thorough and detailed analysis of natural variation in C. elegans egg-laying behavior. The authors present convincing evidence to support their hypothesis that variations in egg-laying behavior are influenced by trade-offs between maternal and offspring fitness. This study establishes a framework for elucidating the molecular mechanisms underlying this paradigm of behavioral evolution.

    1. Reviewer #1 (Public Review):

      Summary:

      In the paper "Disentangling the relationship between cancer mortality and COVID-19", the authors study whether the number of deaths in cancer patients in the USA went up or down during the first year (2020) of the COVID-19 pandemic. They found that the number of deaths with cancer mentioned on the death certificate went up, but only moderately. In fact, the excess with-cancer mortality was smaller than expected if cancer had no influence on the COVID mortality rate and all cancer patients got COVID with the same frequency as in the general population. The authors conclude that the data show no evidence of cancer being a risk factor for COVID and that the cancer patients were likely actively shielding themselves from COVID infections.

      Strengths:

      The paper studies an important topic and uses sound statistical and modeling methodology. It analyzes both, deaths with cancer listed as the primary cause of death, as well as deaths with cancer listed as one of the contributing causes. The authors argue, correctly, that the latter is a more important and reliable indicator to study relationships between cancer and COVID. The authors supplement their US-wide analysis by analysing three states separately.

      Weaknesses:

      The main findings of the paper can be summarized as six numbers. Nationally, in 2022, multiple-cause cancer deaths went up by 2%, Alzheimer's deaths by 31%, and diabetes deaths by 39%. At the same time, assuming no relationship between these diseases and either Covid infection risk or Covid mortality risk, the deaths should have gone up by 7%, 46%, and 28%. The authors focus on cancer deaths and as 2% < 7%, conclude that cancer is not a risk factor for COVID and that cancer patients must have "shielded" themselves against Covid infections.

      However, I did not find any discussion of the other two diseases. For diabetes, the observed excess was 39% instead of "predicted by the null model" 28%. I assume this should be interpreted as diabetes being a risk factor for Covid deaths. I think this should be spelled out, and also compared to existing estimates of increased Covid IFR associated with diabetes.

      And what about Alzheimer's? Why was the observed excess 31% vs the predicted 46%? Is this also a shielding effect? Does the spring wave in NY provide some evidence here? Why/how would Alzheimer's patients be shielded? In any case, this needs to be discussed and currently, it is not.

    2. eLife assessment

      This valuable work explores death coding data to understand the impact of COVID-19 on cancer mortality. The work provides solid evidence that deaths with cancer as a contributing cause were not above what would be expected during pandemic waves, suggesting that cancer did not strongly increase the risk of dying of COVID-19. These results are an interesting exploration into the coding of causes of death that can be used to make sense of how deaths are coded during a pandemic in the presence of other underlying diseases, such as cancer.

    3. Reviewer #2 (Public Review):

      The article is very well written, and the approach is quite novel. I have two major methodological comments, that if addressed will add to the robustness of the results.

      (1) Model for estimating expected mortality. There is a large literature using a different model to predict expected mortality during the pandemic. Different models come with different caveats, see the example of the WHO estimates in Germany and the performance of splines (Msemburi et al Nature 2023 and Ferenci BMC Medical Research Methodology 2023). In addition, it is a common practice to include covariates to help the predictions (e.g., temperature and national holidays, see Kontis et al Nature Medicine 2020). Last, fitting the model-independent for each region, neglects potential correlation patterns in the neighbouring regions, see Blangiardo et al 2020 PlosONE.

      Based on the above:<br /> a. I believe that the authors need to run a cross-validation to justify model performance. I would suggest training the data leaving out the last year for which they have mortality and assessing how the model predicts forward. Important metrics for the prediction performance include mean square error and coverage probability, see Konstantinoudis et al Nature Communications 2023. The authors need to provide metrics for all regions and health outcomes.

      b. In the context of validating the estimates, I think the authors need to carefully address the Alzheimer case, see Figure 2. It seems that the long-term trends pick an inverse U-shape relationship which could be an overfit. In general, polynomials tend to overfit (in this case the authors use a polynomial of second degree). It would be interesting to see how the results change if they also include a cubic term in a sensitivity analysis.

      c. The authors can help with the predictions using temperature and national holidays, but if they show in the cross-validation that the model performs adequately, this would be fine.

      d. It would be nice to see a model across the US, accounting for geography and spatial correlation. If the authors don't want to fit conditional autoregressive models in the Bayesian framework, they could just use a random intercept per region.

      (2) I think the demographic model needs further elaboration. It would be nice to show more details, the mathematical formula of this model in the supplement, and explain the assumptions.

    1. eLife assessment

      This valuable study advances our understanding of the below-ground resource acquisition strategies of diverse tree species, integrating the roles of both roots and their associated microbes. The support for the conclusions is incomplete owing to the uncertainties or shortcomings associated with the design and statistical analyses. Regardless of these technical issues, this study can be of broad interest for plant and microbial ecologists.

    2. Reviewer #1 (Public Review):

      Summary:<br /> In this study, Wu et al. investigated the microbiome in the rhizosphere and roots of plant species along an elevational gradient. They found that: (i) plants with higher root nitrogen ("fast" strategy) were more likely to be associated with saprotrophic fungi, plant pathogenic fungi, and AM fungi, but plants with lower root nitrogen ("slow" strategy) were more likely to be associated with ectomycorrhizal fungi; (ii) bacterial functional guilds were associated with root-zone pH but not root traits.

      Strengths:<br /> This study is novel in the sense that it revealed the associations between microbiome and trait dimensions of plants. This has been rarely explored even though we acknowledge the importance of plant-microbe interactions.

      Weaknesses:<br /> The authors tried to include the relative abundances of bacterial and fungal guilds into the root economics framework, which I disagree with because they are just associated with the root economics framework. The title also states that the authors' aim is to link microbial functional guilds to root economics. Therefore, I would suggest that the analyses should be redone to elaborate on the relationships between microbiome and root functional traits.

      Below I provide some critiques and comments that outline my concerns and provide recommendations to hopefully improve the current manuscript.

      -Figures 2 and 3: The authors included soil properties, relative abundances of bacterial or fungal guilds, and root traits in the root economics spectrum. However, soil properties and relative abundances of bacterial or fungal guilds are not root traits, they are just associated with root traits. These bacterial or fungal guilds are the consequence of root traits. Also, the authors did not elaborate on the root trait dimensions of the plants. The only trait dimension they discussed is the "fast-slow" axis. Therefore, I would suggest the authors first analyze the trait dimensions of plants by only using the root traits (PCA), and then explore how the soil properties and relative abundances of bacterial or fungal guilds are associated with the trait dimensions (e.g., envfit in the vegan package).

      -When exploring the associations between microbial functional guilds and root traits, it is unnecessary to analyze the bacterial and fungal functional guilds separately. The bacterial and fungal functional guilds can be included in the same models, and their relative importance and patterns can be compared.

      -For fungi, the authors used FUNGuild to infer functional guilds from taxonomy. qPCR was also performed to validate the results of AMF. This is fantastic. For bacteria, the authors used FAPROTAX to infer functional guilds from taxonomy. However, archaea are also considered in some functions in FAPROTAX. For example, both bacteria (ammonia-oxidizing bacteria) and archaea (ammonia-oxidizing archaea) play critical roles in nitrification. I would assume the authors have removed archaea from the dataset because they stated that the functions of bacteria are inferred from FAPROTAX. Therefore, the importance of nitrification might be underestimated.

      -Key methodological details are missing. First, maps of the sampling site and plots are missing. It would be great if the authors provided maps showing the location of the sampling site and the spatial distribution of the 11 plots. Second, in lines 304-306 the authors claimed that they sampled the most common species in the plots, but they did not provide the coverage or relative abundances of plant species in the plots.

    3. Reviewer #2 (Public Review):

      Summary:<br /> The authors aimed to determine to what extent root morphology, chemistry, and soil characteristics explained the relative abundance of functional groups of bacteria and fungi associated with roots. To do so, they sample roots and rhizhospheric soil of trees along an elevation gradient. This type of work is common in the field of microbial ecology. The main novelties I see are two: a) a focus on the functional groups of bacteria and fungi rather than just taxonomic abundance. I think this approach is valuable because it provides information about the potential functions of these microorganisms; b) using the root economic spectrum to frame the findings. The root economic spectrum reflects a gradient along which plant roots can be allocated from 'short-lived that provide fast investment return' to 'long-lived that provide a slow investment return'. It is logical to expect (as the authors did) that variation along this gradient will be an important factor in explaining the variation in functional groups.

      Strengths:<br /> The main strength is using the root economic spectrum as a framework to interpret the data. There are countless studies addressing variation in the relative abundance of microbial communities along environmental gradients which tend to be more descriptive. I think using this framework advances the field by suggesting that while the root economic spectrum exists it is not a very important explanatory variable to predict changes in functional diversity. I also think the authors use state-of-the art methods to collect and process the sample (i.e. to obtain the data).

      Weaknesses:<br /> The main weakness is with the presentation of statistical methods as it currently stands. The authors use distance-based redundancy analysis as the main statistical method. However, my understanding is that this method is not advised for a relative abundance of communities. At least not with Euclidean distances which is the default option of the functions dbrda in vegan. The use of this distance would group together communities with no species in common as close to each other (which is an incorrect interpretation). I think the authors should specify what distance they use. My guess is that they actually used bray-curtis in which case this weakness does not apply. However, as it stands it is not specified what metric they use and if they indeed use Euclidean distances it may lead to inaccurate conclusions. In addition, they also mention they use PCA on the relative abundance of functional groups. By definition, PCA is also based on Euclidean distances, which gives a similar problem as dbrda. Thus, I encourage the authors to use bray-curtis distance and specify it in the text.

    4. Reviewer #3 (Public Review):

      Summary:<br /> In this study, the authors collected a large set of data on root traits and root-associated microbes in the root endosphere and rhizosphere in order to integrate these important organisms in the root economics spectrum. By sampling a relatively large set of species from the subtropics along an elevation gradient, they tested whether microbial functions covary with root traits and root trait axes and if so, aimed to discuss what this could tell us about the (belowground) functioning of trees and forests.

      Strengths:<br /> The strengths of this study lie mostly in the impressive dataset set the authors compiled: they sampled belowground properties of a relatively large number of tree species from an understudied region: i.e., the subtropics, where species-level root data are notoriously scarce. Secondly, their extensive sampling of associated microbes to integrate them in the root economics space is an important quality, because of the strong associations between roots and fungi and bacteria: soil microbes are directly related to root form (e.g., mycorrhizal fungi and root diameter and SRL), and function (e.g., taking up soil nutrients from various sources). Thirdly, the PCA figures (Figures 2 and 3) look very nice and intuitive and the paper is very well written.

      Weaknesses:<br /> That said, this study also has several methodological weaknesses that make the results, and therefore the impact of this study difficult to evaluate and interpret.

      (1) Design: The design of this study needs further explanation and justification in the Introduction and Methods sections in order to understand the ecological meaning of the results. Root traits and microbial community composition differ with their environment, and therefore (likely) also with elevation. Elevation is included in the redundancy analysis as a main effect, but without further environmental information, its impact is not ecologically meaningful. What is the rationale for including an elevation gradient in the design and as a main effect in the analyses? Do environmental conditions vary across altitudes and how, and if so, how would this impact the data?

      What is the rationale behind sampling endosphere and rhizosphere microbial communities - why do both? And why also include pathogens - what are their expected roles in the RES? What do we know about this already? The introduction needs a more extensive literature review of these additional variables that are included in the analyses.

      (2) Units of replication and analysis in the model: What are the units of replication and analyses, e.g., how many trees were sampled per species, how many species or trees per elevation, and how many plots per elevation? Were all 11 plots at different elevations and if so, which ones? The level of analysis for the redundancy analyses is not entirely clear: L. 404 mentions that the analyses were done 'across the rhizosphere and root tissue samples', but is that then at the individual-tree level? If so, it seems that these analyses should then also account for dependencies between trees from the same species and phylogeny (as (nested) covariates or random factors). With the information provided, I cannot tell whether there was sufficient replication for statistical interpretations.

      (3) PCA: The results of the parallel analyses are not described: which components were retained? Because the authors aim to integrate microbial functions in a root economics space, I recommend first demonstrating the existence of a root economics space across the 52 subtropical species before running a PCA that includes the microbial traits. The PCA shown in this study does not exactly match the RES and this could be because traits of these species covary differently, but may also simply result from including additional traits to the PCA.

      Also, the PCA's shown are carried out at the individual-tree level. I would recommend, however, including the species-level PCA's in the main text, because the individual-level PCA may not only reflect species-inherent ecological strategies (that e.g., the RES by Bergmann et al. 2020 describe) but also plasticity (Figures 2 and 3 both show an elevation effect that may be partly due to plasticity). While the results here are rather similar, intraspecific differences in root traits may follow different ecological principles and therefore not always be appropriate to compare with an interspecific RES (see for example Weemstra & Valverde-Barrantes, 2022, Annals of Botany).

      I could not deduce whether tree species in the "fungal PCA" (Figure 2) were assigned as AM or EcM based on Table 1, or based on their observed fungal community composition. In the former case, the fungal functional guild gradient (from EcM to saprotrophs and AM) is partially an artificial one, because EcM tree species are not AM species (according to Table 1) and therefore, by definition, constitute a tradeoff or autocorrelation. And, as the authors also discuss, AM tree species may host EcM fungal species. Before I can evaluate the ecological meaning of PC1, and whether or not it really represents a mineral/organic nutrient gradient, information is needed on which data are used here.

      I do not agree with the term 'gradient of bacterial guilds' (i.e., PC1 in Figure 3). All but 1 bacterial 'function' positively loaded on PC1 and 'fermentation' was only weakly negatively correlated with PC1. I do not think this constitutes a 'bacterial gradient'.

      (4) Soil samples: Were they collected from the surrounding soil of each tree (L. 341), or from the root zone (L. 110). The former seems to refer to bulk soil samples, but the latter could be interpreted as rhizosphere soils. It is therefore not entirely clear whether these are the same soil samples, and if so, where they were sampled exactly.

      Aims:<br /> The authors aimed to integrate endospheric and rhizospheric microbial and fungal community composition in the root economics space. Owing to statistical concerns (i.e., lacking parallel analysis results and the makeup of the PCs (AM versus EcM classification), I am not sure the authors succeeded in this. Besides that, the interpretation of the axes seems rather oversimplified and needs some consideration.

      Root N is discussed as an important driver of fungal functional composition. Indeed, it was one of the significant variables in the redundancy models predicting microbial community composition, but its contribution to community composition was small (2 - 3 %), and the mechanistic interpretation was rather speculative. Specifically, the role of root N in root (and tree) functioning remains highly uncertain: the link with respiration and exudation is increasingly demonstrated but its actual meaning for nutrient uptake is not well understood (Freschet et al. 2021. New Phytologist). If and how root economics (represented by root N) and the fungal-driven nutrient economy (EcM versus AM, saprotrophs) can indeed be integrated into a unified framework (L. 223 - 224) seems a relevant question that is worth pursuing based on this paper, but in my opinion, this study does not clearly answer it, because the statistical analyses might need further work (or explanation) and underlying mechanisms are not well explained and supported by evidence.

      In addition, the root morphology axis was indeed independent of the "fungal gradient", but this is in itself not an interesting finding. What is interesting, but not discussed is that, generally, AM species are expected to have thicker roots than EcM tree species (Gu et al. 2014 Tree Physiology; Kong et al. 2014 New Phytologist). I am therefore curious to see why this is not the case here? Did the few EcM species sampled just happen to have very thick roots? Or is there a phylogenetic effect that influences both mycorrhizal type and root thickness that is not accounted for here (Baylis, 1975; Guo et al., 2008 New Phytologist; Kubisch et al., 2015 Frontiers in Plant Science; Valverde-Barrantes et al., 2015 Functional Ecology; 2016 Plant and Soil)?

      I also do not agree with the conclusion that this integrated framework 'explained' tree distributions along the elevation gradient. First of all, it is difficult to interpret because the elevation gradient is not well explained (e.g., in terms of environmental variation). Secondly, the framework might coincide with the framework, but the framework does not explain it: an environmental gradient probably underlies the elevation gradient that may be selected for species with certain root traits or mycorrhizal types, but this is not tested nor clearly demonstrated by the data. It thus remains rather speculative, and it should be more thoroughly explained based on the data observed. Similarly, I do not understand from this study how root traits like root N can influence the abundance of EcM and pathogenic fungi (L. 242 - 243). Which data show this causality? It seems a strong statement, but not well supported (or explained).

      Impact:<br /> The data collected for this study are timely, valuable, and relevant. Soilborne microbes (fungi and bacteria; symbionts and pathogens) play important roles in root trait expressions (e.g., root diameter) and below-ground functioning (e.g., resource acquisition). They should therefore not be excluded from studies into the belowground functioning of forests, but they mostly are. This dataset therefore has the potential to improve our understanding of this subject. Making these data publicly available in large-scale datasets that have recently been initiated (e.g., FRED) will also allow further study in comparative (with other biomes) or global (across biomes) studies.

      Technically, the methodology seems sound, although I lack the expertise to judge the Molecular Methods (L. 349 - 397). However, owing to some statistical uncertainties mentioned above (that the authors might well clarify or improve) and the oversimplified discussion, I am hesitant to determine the impact of the contents of this work. Statistical improvements and/or clearer explanation/justification of statistical choices made can make this manuscript highly interesting and impact, however.

      Context:<br /> As motivated above, I am not sure to what extent the EcM - AM/saprotroph presents a true ecological tradeoff. However, if it does, this work would fit very well in the context of the mycorrhizal-associated nutrient economy (Phillips et al. 2013 New Phytology). This theory postulates that EcM trees generally produce low-quality litter (associated with 'slow traits') that can be more readily accessed by EcM but not AM fungi, thereby slowing down nutrient cycling rates at their competitive advantage, and vice versa for AM tree species. This study did not aim to test the MANE, so it was beyond its scope to study litter quality, and the number of EcM and AM species was unbalanced (8 EcM versus 44 AM species): nonetheless, the denser roots of EcM species and higher root N of AM species indicates that the MANE may also apply to this subtropical forest and may be an interesting impetus for future work on this topic. It might also offer one way to bridge the root economics space and the MANE.

      What I also found interesting is the sparse observations of EcM fungal taxa in the root endosphere of species typically identified as AM hosts (L. 212 - 214). While their functionality remains to be tested (fungal structures in the endosphere were not studied here), this observation might call for renewed attention to classifying species as AM, EcM, or both.

    5. Reviewer #4 (Public Review):

      Summary:<br /> Recent progress in root economics has revealed global-scale axes of covaried root traits that reflect various root resource acquisition strategies. These covariance patterns are powerful tools for understanding root functional diversity. However, roots do not function in isolation for below-ground resource acquisition. Rather, symbiotic fungi and rhizosphere microorganisms often collaborate with plant roots, forming a root-microbial-soil continuum. This study seeks to provide novel insights into this continuum by extending the existing framework of root economics to include the structures of root-associated microorganisms. I find this topic highly relevant. Considering the role of soil microorganisms is undoubtedly crucial for a more comprehensive understanding of below-ground resource strategies.

      Major comments:<br /> A key finding of this study is a relationship between root N and the tendency for roots to associate with particular types of mycorrhizal associations (Line 27, Fig. 2). The authors concluded that this indicates "a linkage from simple root traits to fungal-mediated carbon nutrient cycling" (line 27) and integrates "microbial functions into the root economics framework," (line 32). If substantiated, this correlation could represent a significant discovery about the connection between root functional traits and root-associated fungi. It suggests that low root N, indicative of low metabolic activity within the root economics framework, is linked with forming EcM associations. However, I am not fully convinced this is the case based on the current data presentation and interpretation.

      First, there is no biological interpretation of this relationship between root N and mycorrhizal type. It merely noted that root N is indicative of root metabolic activity, and thus by relating root N to fungal composition, "the trait-related root economics and fungal-driven nutrient economics may be integrated into a unified framework" (lines 221-224). Why would roots with low N and low metabolic activity tend to favor EcM associations? What are the potential mechanisms? Biological interpretation is essential for understanding whether a statistical correlation reflects a causal and meaningful relationship or is coincidental.

      I am also concerned that this relationship may be spurious, especially when it lacks biological interpretation. EcM is underrepresented in this study (8 EcM species, of which more than half are conifers and oaks vs. 44 AM) and seems to cluster at higher elevations (line 231). Thus, the tree species/individual data points are not independent, but phylogenetically and geographically clustered. The unique properties at higher elevations (e.g., distinct plant community structures, low levels of mineral N) may drive both the lower root N and the prevalence of EcM associations. This scenario aligns with the observation that at higher elevations, AM roots also exhibited low root N (Line 231). In this case, root N may not directly relate to mycorrhizal type but is characteristic of certain locations (or closely related species), and it would be misleading to suggest that low root N/metabolic activity, a proxy in fast-slow root economics, is directly linked to the preference for a particular mycorrhizal type (lines 27-28, 220 - 224). In summary, because the studied tree species appear to be clustered both phylogenetically and geographically, these factors need to be carefully taken into account in the statistical analysis and data interpretation to understand the underlying causes of the apparent relationship and prevent overinterpretation. I also recommend, if possible, providing a visual presentation of the geographical and phylogenetic distribution of the studied tree species.

      That being said, this dataset is undoubtedly valuable in revealing the shifts in the compositional structures of root-associated soil microorganisms. However, integrating the traits of microbial composition to root trait economics would require more caution and careful examination of the potential driving causes.

    1. eLife assessment

      This study divided structural brain aging into two groups, revealing that one group is more vulnerable to aging and brain-related diseases compared to the other group. This study is valuable as such subtyping could be utilized in predicting and diagnosing cognitive decline and neurodegenerative brain disorders in the future. However, the authors' claims remain incomplete, as there appears to be a lack of connection between this and the authors' claims.

    2. Reviewer #1 (Public Review):

      Summary:<br /> Duan et al analyzed brain imaging data in UKBK and found a pattern in brain structure changes by aging. They identified two patterns and found links that can be differentiated by the categorization.

      Strengths:<br /> This discovery harbors a substantial impact on aging and brain structure and function.

      Weaknesses:<br /> Therefore, the study requires more validation efforts. Most importantly, data underlying the stratification of the two groups are not obvious and lack further details. Can they also stratified by different methods? i.e. PCA?

      Are there any external data that can be used for validation?

      Other previous discoveries or claims supporting the results of the study should be explored to support the conclusion.

      Sex was merely used as a covariate. Were there sex differences during brain aging? What was the sex ratio difference in groups 1 and 2?

      Although statistically significant, Figure 3 shows minimal differences. LTL and phenoAge are displayed in adjusted values but what are the actual values that differ between patterns 1 and 2?

      It is not intuitive to link gene expression results shown in Figure 8 and brain structure and functional differences between patterns 1 and 2. Any overlap of genes identified from analyses shown in Figure 6 (GWAS) and 8 (gene expression)?

    3. Reviewer #2 (Public Review):

      Summary:<br /> The authors aimed to understand the heterogeneity of brain aging by analyzing brain imaging data. Based on the concept of structural brain aging, they divided participants into two groups based on the volume and rate of decrease of gray matter volume (GMV). The group with rapid brain aging showed accelerated biological aging and cognitive decline and was found to be vulnerable to certain neuropsychiatric disorders. Furthermore, the authors claimed the existence of a "last in, first out" mirroring pattern between brain aging and brain development, which they argued is more pronounced in the group with rapid brain aging. Lastly, the authors identified genetic differences between the two groups and speculated that the cause of rapid brain aging may lie in genetic differences.

      Strengths:<br /> The authors supported their claims by analyzing a large amount of data using various statistical techniques. There seems to be no doubt about the quality and quantity of the data. Additionally, they demonstrated their strength in integrating diverse data through various analysis techniques to conclude.

      Weaknesses:<br /> There appears to be a lack of connection between the analysis results and their claims. Readers lacking sufficient background knowledge of the brain may find it difficult to understand the paper. It would be beneficial to modify the figures and writing to make the authors' claims clearer to readers. Furthermore, the paper gives an overall impression of being less polished in terms of abbreviations, figure numbering, etc. These aspects should be revised to make the paper easier for readers to understand.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study presents fundamental new insights into vesicular monoamine transport and the binding pose of the clinical drug tetrabenazine (TBZ) to the mammalian VMAT2 transporter. Specifically, this study reports the first structure for the mammalian VMAT (SLC18) family of vesicular monoamine transporters. It provides insights into the mechanism by which this inhibitor traps VMAT2 into a 'dead-end' conformation. The structure also provides some evidence for a novel gating mechanism within VMAT2, which may have wider implications for understanding the mechanism of transport in the wider SLC18 family.

      Strengths:

      The structure is high quality, and the method used to determine the structure via fusing mVenus and the anti-GFP nanobody to the amino and carboxyl termini is novel. The binding and transport data are convincing and provide new insights into the role of conserved side chains within the SLC18 members. The binding position of TBZ is of high value, given its role in treating Huntington's chorea and for being a 'dead-end' inhibitor for VMAT2.

      We thank reviewer #1 for their constructive comments and input which we feel has greatly improved the manuscript.

      Reviewer #2 (Public Review):

      This public review is the same review that was posted earlier and has not been updated in response to our comments or to the revised manuscript. Please see our earlier response to these comments. We thank reviewer #2 for their input and we have incorporated many of these suggestions into our revised manuscript. With regard to the question of ‘how TBZ got there’, we have revised this sentence in the discussion to be more speculative. As pointed out earlier, our interpretation of the structure is based on a wealth of experimental and structural data which support our interpretations. Thus, our conclusions have not been overstated. This has been explained in our earlier public response and these key studies have been cited throughout the manuscript. We also note that reviewer #3 found the AlphaFold comparisons to be quite meaningful.

      Overview:

      As a report of the first structure of VMAT2, indeed the first structure of any vesicular monoamine transporter, this manuscript represents an important milestone in the field of neurotransmitter transport. VMAT2 belongs to a large family (the major facilitator superfamily, MFS) containing transporters from all living species. There is a wealth of information relating to the way that MFS transporters bind substrates, undergo conformational changes to transport them across the membrane and couple these events to the transmembrane movement of ions. VMAT2 couples the movement of protons out of synaptic vesicles to the vesicular uptake of biogenic amines (serotonin, dopamine and norepinephrine) from the cytoplasm. The new structure presented in this manuscript can be expected to contribute to an understanding of this proton/amine antiport process.

      The structure contains a molecule of the inhibitor TBZ bound in a central cavity, with no access to either luminal or cytoplasmic compartments. The authors carefully analyze which residues interact with bound TBZ and measure TBZ binding to VMAT2 mutated at some of those residues. These measurements allow well-reasoned conclusions about the differences in inhibitor selectivity between VMAT1 and VMAT2 and differences in affinity between TBZ derivatives.

      The structure also reveals polar networks within the protein and hydrophobic residues in positions that may allow them to open and close pathways between the central binding site and the cytoplasm or the vesicle lumen. The authors propose involvement of these networks and hydrophobic residues in coupling of transport to proton translocation and conformational changes. However, these proposals are quite speculative in the absence of supporting structures and experimentation that would test specific mechanistic details.

      Critique:

      Although the structure presented in this MS is clearly important, I feel that the authors have overstated several of the conclusions that can be drawn from it. I don't agree that the structure clearly indicates why TBZ is a non-competitive inhibitor; the proposal that specific hydrophobic residues function as gates will depend on lumen- and cytoplasm-facing structures for verification; the polar networks could have any number of functions - indeed it would be surprising if they were all involved in proton transport. Several of these issues could be resolved by a clearer illustration of the data, but I believe that a more rigorous description of the conclusions and where they fall between firm findings and speculation would help the reader put the results in perspective.

      Non-competitive inhibition occurs when the action of an inhibitor can't be overcome by increasing substrate concentration. The structure shows TBZ sequestered in the central cavity with no access to either cytoplasm or lumen. The explanation of competitive vs non-competitive inhibition depends entirely on how TBZ got there. If it bound from the cytoplasm, cytoplasmic substrate should have been able to compete with TBZ and overcome the inhibition. If it bound from the lumen, or from within the bilayer, cytoplasmic substrate would not be able to compete, and inhibition would be non-competitive. The structure does not tell us how TBZ got there, only that it was eventually occluded from both aqueous compartments and the bilayer.

      The issue of how VMAT2 opens access to the central binding site from luminal and cytoplasmic sides is an important and interesting one, and comparison with other MFS structures in cytoplasmic-open or extracellular/luminal-open is a very reasonable approach. However, any conclusions for VMAT2 should be clearly indicated as speculative in the absence of comparable open structures of VMAT2. As a matter of presentation, I found the illustrations in ED Fig. 6 to be less helpful than they could have been. Specifically, illustrations that focus on the proposed gates, comparing that region of the new structure with the corresponding region of either VGLUT or GLUT4 would better help the reader to compare the position of the proposed gate residues with the corresponding region of the open structure. I realize that is the intended purpose of ED Fig. 6b and 6c, but currently, those show the entire protein and a focus on the gate regions might make the proposed gate movements clearer. I also appreciate the difference between the Alphafold prediction and the new structure, but I'm not convinced that ED Fig. 6a adds anything helpful.

      The polar networks described in the manuscript provide interesting possibilities for interactions with substrates and protons whose binding to VMAT2 must control conformational change. Aside from the description of these networks, there is little evidence presented to assess the role of these networks in transport. Are the networks conserved in other closely related transporters? How could the interaction of the networks with substrate or protons affect conformational change? Of course, any potential role proposed for the networks would be highly speculative at this point, and any discussion of their role should point out their speculative nature and the need for experimental verification. Some speculation, however, can be useful for focusing the field's attention on future directions. However, statements in the abstract (three distinct polar networks... play a role in proton transduction.) and the discussion (...are likely also involved in mediating proton transduction.) should be clearly presented as speculation until they are validated experimentally.

      The strongest aspect of this work (aside from the structure itself) is the analysis of TBZ binding. I will comment on some minor points below, but there is one problematic aspect to this analysis. The discussion on how TBZ stabilizes the occluded conformation of VMAT2 is premature without structures of apo-VMAT2 and possibly structures with other ligands bound. We don't really know at this point whether VMAT2 might be in the same occluded conformation in the absence of TBZ. Any statements regarding the effect of interactions between VMAT2 and TBZ depend on demonstrating that TBZ has a conformational effect. The same applies to the discussion of the role of W318 on conformation and to the loops proposed to "occlude the luminal side of the transporter" (line 131).

      The description of VMAT2 mechanism makes many assumptions that are based on studies with other MFS transporters. Rather than stating these assumptions as fact (VMAT2 functions by alternating access...), it would be preferable to explain why a reader should believe these assumptions. In general, this discussion presents conclusions as established facts rather than proposals that need to be tested experimentally.

      The MD simulations are not described well enough for a general reader. What is the significance of the different runs? ED Fig. 4d is not high enough resolution to see the details.

      Reviewer #3 (Public Review):

      Summary:

      The vesicular monoamine transporter is a key component in neuronal signaling and is implicated in diseases such as Parkinson's. Understanding of monoamine processing and our ability to target that process therapeutically has been to date provided by structural modeling and extensive biochemical studies. However, structural data is required to establish these findings more firmly.

      Strengths:

      Dalton et al resolved a structure of VMAT2 in the presence of an important inhibitor, tetrabenazine, with the protein in detergent micelles, using cryo-EM and with the aid of protein domains fused to its N- and C-terminal ends, including one fluorescent protein that facilitated protein screening and purification. The resolution of the maps allows clear assignment of the amino acids in the core of the protein. The structure is in good agreement with a wealth of experimental and structural prediction data, and provides important insights into the binding site for tetrabenazine and selectivity relative to analogous compounds. The authors provide additional biochemical analyses that further support their findings. The comparison with AlphaFold models is enlightening.

      We appreciate this summary and thank reviewer #3 for their helpful suggestions to improve the manuscript.

      Weaknesses:

      The authors follow up their structures with molecular dynamics simulations of the tetrabenazine-bound state, and test several protonation states of acidic residues in the binding pocket, but not all possible combinations; thus, it is not clear the extent to which tetrabenazine rearrangements observed in these simulations are meaningful. Additional simulations of the substrate dopamine docked into this structure were also carried out, although it is unclear whether this "dead-end" occluded state is a relevant state for dopamine binding. The authors report release of dopamine during these simulations, but it is notable that this only occurs when all four acidic binding site residues were protonated and when an enhanced sampling approach was applied.

      As an occluded neurotransmitter bound structure has yet to be solved experimentally, it is not possible to address whether this state resembles the docked dopamine structure. However, it is reasonable to hypothesize that this is a relevant state for dopamine binding and if so, these simulations would be of great interest. The MD simulations which were performed are logical, based on the calculated pKa of the residues and the known pH of the vesicle lumen (5.5). Note that we have carried out a total of more than 2 microseconds of simulations, which required a significant computing time/memory allocation for the current runs in explicit water and membrane. To investigate all possible combinations, it would require at least 16 independent simulations, to be performed in duplicates, to vary protonation status of the four highlighted acidic residues alone, not including proper experimental replicates. We do not believe this to be a feasible suggestion, nor necessary given that the selected combinations were based on rational evaluation of on-path amino acids that were assessed to be potentially protonated.

    2. eLife assessment

      The report presents the cryo-EM structure of human vesicular monoamine transporter 2 (VMAT2) bound to tetrabenazine, a clinical drug. VMAT2 is critical for neurotransmission, and the study constitutes an important milestone in neurotransmitter transport research. The evidence presented in the report is convincing and provides new opportunities for developing improved therapeutic interventions and furthering our understanding of this vital protein's function.

    3. Reviewer #1 (Public Review):

      Summary:

      This study presents fundamental new insights into vesicular monoamine transport and the binding pose of the clinical drug tetrabenazine (TBZ) to the mammalian VMAT2 transporter. Specifically, this study reports the first structure for the mammalian VMAT (SLC18) family of vesicular monoamine transporters. It provides insights into the mechanism by which this inhibitor traps VMAT2 into a 'dead-end' conformation. The structure also provides some evidence for a novel gating mechanism within VMAT2, which may have wider implications for understanding the mechanism of transport in the wider SLC18 family.

      Strengths:

      The structure is high quality, and the method used to determine the structure via fusing mVenus and the anti-GFP nanobody to the amino and carboxyl termini is novel. The binding and transport data are convincing and provide new insights into the role of conserved side chains within the SLC18 members. The binding position of TBZ is of high value, given its role in treating Huntington's chorea and for being a 'dead-end' inhibitor for VMAT2.

    4. Reviewer #2 (Public Review):

      As a report of the first structure of VMAT2, indeed the first structure of any vesicular monoamine transporter, this manuscript represents an important milestone in the field of neurotransmitter transport. VMAT2 belongs to a large family (the major facilitator superfamily, MFS) containing transporters from all living species. There is a wealth of information relating to the way that MFS transporters bind substrates, undergo conformational changes to transport them across the membrane and couple these events to the transmembrane movement of ions. VMAT2 couples the movement of protons out of synaptic vesicles to the vesicular uptake of biogenic amines (serotonin, dopamine and norepinephrine) from the cytoplasm. The new structure presented in this manuscript can be expected to contribute to an understanding of this proton/amine antiport process.

      The structure contains a molecule of the inhibitor TBZ bound in a central cavity, with no access to either luminal or cytoplasmic compartments. The authors carefully analyze which residues interact with bound TBZ and measure TBZ binding to VMAT2 mutated at some of those residues. These measurements allow well-reasoned conclusions about the differences in inhibitor selectivity between VMAT1 and VMAT2 and differences in affinity between TBZ derivatives.

      The structure also reveals polar networks within the protein and hydrophobic residues in positions that may allow them to open and close pathways between the central binding site and the cytoplasm or the vesicle lumen. The authors propose involvement of these networks and hydrophobic residues in coupling of transport to proton translocation and conformational changes.

    5. Reviewer #3 (Public Review):

      Summary:

      The vesicular monoamine transporter is a key component in neuronal signaling and is implicated in diseases such as Parkinson's. Understanding of monoamine processing and our ability to target that process therapeutically has been to date provided by structural modeling and extensive biochemical studies. However, structural data is required to establish these findings more firmly.

      Strengths:

      Dalton et al resolved a structure of VMAT2 in the presence of an important inhibitor, tetrabenazine, with the protein in detergent micelles, using cryo-EM and with the aid of protein domains fused to its N- and C-terminal ends, including one fluorescent protein that facilitated protein screening and purification. The resolution of the maps allows clear assignment of the amino acids in the core of the protein. The structure is in good agreement with a wealth of experimental and structural prediction data, and provides important insights into the binding site for tetrabenazine and selectivity relative to analogous compounds. The authors provide additional biochemical analyses that further support their findings. The comparison with AlphaFold models is enlightening.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      This manuscript from Mukherjee et al examines potential connections between telomere length and tumor immune responses. This examination is based on the premise that telomeres and tumor immunity have each been shown to play separate, but important, roles in cancer progression and prognosis as well as prior correlative findings between telomere length and immunity. In keeping with a potential connection between telomere length and tumor immunity, the authors find that long telomere length is associated with reduced expression of the cytokine receptor IL1R1. Long telomere length is also associated with reduced TRF2 occupancy at the putative IL1R1 promoter. These observations lead the authors towards a model in which reduced telomere occupancy of TRF2 - due to telomere shortening - promotes IL1R1 transcription via recruitment of the p300 histone acetyltransferase. This model is based on earlier studies from this group (i.e. Mukherjee et al., 2019) which first proposed that telomere length can influence gene expression by enabling TRF2 binding and gene transactivation at telomere-distal sites. Further mechanistic work suggests that G-quadruplexes are important for TRF2 binding to IL1R1 promoter and that TRF2 acetylation is necessary for p300 recruitment. Complementary studies in human triple-negative breast cancer cells add potential clinical relevance but do not possess a direct connection to the proposed model. Overall, the article presents several interesting observations, but disconnection across central elements of the model and the marginal degree of the data leave open significant uncertainty regarding the conclusions.

      Strengths:

      Many of the key results are examined across multiple cell models.

      The authors propose a highly innovative model to explain their results.

      Weaknesses:

      Although the authors attempt to replicate most key results across multiple models, the results are often marginal or appear to lack statistical significance. For example, the reduction in IL1R1 protein levels observed in HT1080 cells that possess long telomeres relative to HT1080 short telomere cells appears to be modest (Supplementary Figure 1I). Associated changes in IL1R1 mRNA levels are similarly modest.

      Related to the point above, a lack of strong functional studies leaves an open question as to whether observed changes in IL1R1 expression across telomere short/long cancer cells are biologically meaningful.

      Statistical significance is described sporadically throughout the paper. Most major trends hold, but the statistical significance of the results is often unclear. For example, Figure 1A uses a statistical test to show statistically significant increases in TRF2 occupancy at the IL1R1 promoter in short telomere HT1080 relative to long telomere HT1080. However, similar experiments (i.e. Figure 2B, Figure 4A - D) lack statistical tests.

      TRF2 overexpression resulted in ~ 5-fold or more change in IL1R1 expression. Compared to this, telomere length-dependent alterations in IL1R1 expression, although about 2-fold, appear modest (~ 50% reduction in cells with long telomeres across different model systems used). Notably, this was consistent and significant across cell-based model systems and xenograft tumors (see Figure 1). Unlike TRF2 induction, telomere elongation or shortening vary within the permissible physiological limits of cells. This is likely to result in the observed variation in IL1R1 levels. For biological relevance, we further demonstrated that IL1 signalling in TNBC tissue and tumor organoids, and M2 macrophage infiltration, was significantly dependent on telomere length. Details of tests of significance were included in the individual figure legends. Based on the comment here we will expand on it in a dedicated paragraph in the methods section to make the information clearer for readers. We noticed that the stars (*) denoting statistical significance were omitted in some ChIP-experiment figures. This was likely an error during figure assembly for PDF conversion. We thank the reviewer for bringing this up; necessary changes will be made in the revised manuscript.

      Reviewer #2 (Public Review):

      This study highlights the role of telomeres in modulating IL-1 signaling and tumor immunity. The authors demonstrate a strong correlation between telomere length and IL-1 signaling by analyzing TNBC patient samples and tumor-derived organoids. Mechanistic insights revealed non-telomeric TRF2 binding at the IL-1R1. The observed effects on NF-kB signaling and subsequent alterations in cytokine expression contribute significantly to our understanding of the complex interplay between telomeres and the tumor microenvironment. Furthermore, the study reports that the length of telomeres and IL-1R1 expression is associated with TAM enrichment. However, the manuscript lacks in-depth mechanistic insights into how telomere length affects IL-1R1 expression. Overall, this work broadens our understanding of telomere biology.

      The mechanism of how telomere length affects IL1R1 expression involves sequestration and reallocation of TRF2 between telomeres and gene promoters (in this case, the IL1R1 promoter). We have previously shown this across multiple genomic sites (Mukherjee et al, 2018; reviewed in J. Biol. Chem. 2020, Trends in Genetics 2023). We have described this in the manuscript along with references citing the previous works. A scheme explaining the model was provided as Additional Supplementary Figure 1, along with a description of the mechanistic model.

      Figure 1-4 in main figures describe the molecular mechanism of telomere-dependent IL1R1 activation. This includes ChIP data for TRF2 on the IL1R1 promoter in long/short telomeres, as well as TRF2-mediated histone/p300 recruitment and IL1R1 gene expression. We further show how specific acetylation on TRF2 is crucial for TRF2-mediated IL1R1 regulation (Figure 5).

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, entitled "Telomere length sensitive regulation of Interleukin Receptor 1 type 1 (IL1R1) by the shelterin protein TRF2 modulates immune signalling in the tumour microenvironment", Dr. Mukherjee and colleagues pointed out clarifying the extra-telomeric role of TRF2 in regulating IL1R1 expression with consequent impact on TAMs tumor-infiltration.

      Strengths:

      Upon careful manuscript evaluation, I feel that the presented story is undoubtedly well conceived. At the technical level, experiments have been properly performed and the obtained results support the authors' conclusions.

      Weaknesses:

      Unfortunately, the covered topic is not particularly novel. In detail, the TRF2 capability of binding extratelomeric foci in cells with short telomeres has been well demonstrated in a previous work published by the same research group. The capability of TRF2 to regulate gene expression is well-known, the capability of TRF2 to interact with p300 has been already demonstrated and, finally, the capability of TRF2 to regulate TAMs infiltration (that is the effective novelty of the manuscript) appears as an obvious consequence of IL1R1 modulation (this is probably due to the current manuscript organization).

      Here we studied the TRF2-IL1R1 regulatory axis (not reported earlier by us or others) as a case of the telomere sequestration model that we described earlier (Mukherjee et al., 2018; reviewed in J. Biol. Chem. 2020, Trends in Genetics 2023). This manuscript demonstrates the effect of the TRF2-IL1R1 regulation on telomere-sensitive tumor macrophage recruitment. To the best of our knowledge, no previous study connects telomeres of tumor cells mechanistically to the tumor immune microenvironment. Here we focused on the IL1R1 promoter and provided mechanistic evidence for acetylated-TRF2 engaging the HAT p300 for epigenetically altering the promoter. This mechanism of TRF2 mediated activation has not been previously reported. Further, the function of a specific post translational modification (acetylation of the lysine residue 293K) of TRF2 in IL1R1 regulation is described for the first time. Additional experiments showed that TRF2-acetylation mutants, when targeted to the IL1R1 promoter, significantly alter the transcriptional state of the IL1R1 promoter. To our knowledge, the function of any TRF2 residue in transcriptional activation had not been previously described. Taken together, these demonstrate novel insights into the mechanism of TRF2-mediated gene regulation, that is telomere-sensitive, and affects the tumor-immune microenvironment. We are considering the suggestion to reorganize the manuscript to highlight the novel aspects of our work more convincingly.

    2. Reviewer #3 (Public Review):

      Summary:<br /> In this manuscript, entitled "Telomere length sensitive regulation of Interleukin Receptor 1 type 1 (IL1R1) by the shelterin protein TRF2 modulates immune signalling in the tumour microenvironment", Dr. Mukherjee and colleagues pointed out clarifying the extra-telomeric role of TRF2 in regulating IL1R1 expression with consequent impact on TAMs tumor-infiltration.

      Strengths:<br /> Upon careful manuscript evaluation, I feel that the presented story is undoubtedly well conceived. At the technical level, experiments have been properly performed and the obtained results support the authors' conclusions.

      Weaknesses:<br /> Unfortunately, the covered topic is not particularly novel. In detail, the TRF2 capability of binding extratelomeric foci in cells with short telomeres has been well demonstrated in a previous work published by the same research group. The capability of TRF2 to regulate gene expression is well-known, the capability of TRF2 to interact with p300 has been already demonstrated and, finally, the capability of TRF2 to regulate TAMs infiltration (that is the effective novelty of the manuscript) appears as an obvious consequence of IL1R1 modulation (this is probably due to the current manuscript organization).

    3. eLife assessment

      This study presents an important finding on the role of telomeres in modulating interleukin-1 signaling and tumor immunity in TNBC. The evidence supporting these findings is solid, presented through comprehensive analyses including TNBC clinical samples, tumor-derived organoids, cancer cells, and xenografts. The work will be of broad interest to cell and medical biologists focusing on TNBC.

    4. Reviewer #1 (Public Review):

      Summary:<br /> This manuscript from Mukherjee et al examines potential connections between telomere length and tumor immune responses. This examination is based on the premise that telomeres and tumor immunity have each been shown to play separate, but important, roles in cancer progression and prognosis as well as prior correlative findings between telomere length and immunity. In keeping with a potential connection between telomere length and tumor immunity, the authors find that long telomere length is associated with reduced expression of the cytokine receptor IL1R1. Long telomere length is also associated with reduced TRF2 occupancy at the putative IL1R1 promoter. These observations lead the authors towards a model in which reduced telomere occupancy of TRF2 - due to telomere shortening - promotes IL1R1 transcription via recruitment of the p300 histone acetyltransferase. This model is based on earlier studies from this group (i.e. Mukherjee et al., 2019) which first proposed that telomere length can influence gene expression by enabling TRF2 binding and gene transactivation at telomere-distal sites. Further mechanistic work suggests that G-quadruplexes are important for TRF2 binding to IL1R1 promoter and that TRF2 acetylation is necessary for p300 recruitment. Complementary studies in human triple-negative breast cancer cells add potential clinical relevance but do not possess a direct connection to the proposed model. Overall, the article presents several interesting observations, but disconnection across central elements of the model and the marginal degree of the data leave open significant uncertainty regarding the conclusions.

      Strengths:<br /> Many of the key results are examined across multiple cell models.

      The authors propose a highly innovative model to explain their results.

      Weaknesses:<br /> Although the authors attempt to replicate most key results across multiple models, the results are often marginal or appear to lack statistical significance. For example, the reduction in IL1R1 protein levels observed in HT1080 cells that possess long telomeres relative to HT1080 short telomere cells appears to be modest (Supplementary Figure 1I). Associated changes in IL1R1 mRNA levels are similarly modest.

      Related to the point above, a lack of strong functional studies leaves an open question as to whether observed changes in IL1R1 expression across telomere short/long cancer cells are biologically meaningful.

      Statistical significance is described sporadically throughout the paper. Most major trends hold, but the statistical significance of the results is often unclear. For example, Figure 1A uses a statistical test to show statistically significant increases in TRF2 occupancy at the IL1R1 promoter in short telomere HT1080 relative to long telomere HT1080. However, similar experiments (i.e. Figure 2B, Figure 4A - D) lack statistical tests.

    5. Reviewer #2 (Public Review):

      This study highlights the role of telomeres in modulating IL-1 signaling and tumor immunity. The authors demonstrate a strong correlation between telomere length and IL-1 signaling by analyzing TNBC patient samples and tumor-derived organoids. Mechanistic insights revealed non-telomeric TRF2 binding at the IL-1R1. The observed effects on NF-kB signaling and subsequent alterations in cytokine expression contribute significantly to our understanding of the complex interplay between telomeres and the tumor microenvironment. Furthermore, the study reports that the length of telomeres and IL-1R1 expression is associated with TAM enrichment. However, the manuscript lacks in-depth mechanistic insights into how telomere length affects IL-1R1 expression. Overall, this work broadens our understanding of telomere biology.

    1. Reviewer #2 (Public Review):

      In this study, the authors address discrepancies in determining the local bacterial burden in osteomyelitis between that determined by culture and enumeration by DNA-directed assay. Discrepancies between culture and other means of bacterial enumeration are long established and highlighted by Staley and Konopka's classic, "The great plate count anomaly" (1985). Here, the authors first present data demonstrating the emergence of discrepancies between CFU counts and genome copy numbers detected by PCR in S. aureus strains infecting osteocyte-like cells. They go on to demonstrate PCR evidence that S. aureus can be detected in bone samples from sites meeting a widely accepted clinicopathological definition of osteomyelitis. They conclude their approach offers advantages in quantifying intracellular bacterial load in their in vitro "co-culture" system.

      Weaknesses<br /> - My main concern here is the significance of these results outside the model osteocyte system used by this group. Although they carefully avoid over-interpreting their results, there is a strong undercurrent suggesting their approach could enhance aetiologic diagnosis in osteomyelitis and that enumeration of the infecting pathogen might have clinical value. In the first place, molecular diagnostics such as 16S rDNA-directed PCR are well established in identifying pathogens that don't grow. Secondly, it is hard to see how enumeration could have value beyond in vitro and animal model studies since serial samples will rarely be available from clinical cases.

      - I have further concerns regarding the interpretation of the combined bacterial and host cell-directed PCRs against the CFU results. Significance is attached to the relatively sustained genome counts against CFU declines. On the one hand, it must be clearly recognised that the detection of bacterial genomes does not equate to viable bacterial cells with the potential for further replication or production of pathogenic factors. Of equal importance is the potential contribution of extracellular DNA from lysed bacteria and host cells to these results. The authors must clarify what steps, if any, they have taken to eliminate such contributions for both bacteria and host cells. Even the treatment with lysotaphin may have coated their osteocyte cultures with bacterial DNA, contributing downstream to the ddPCR results presented.

      Strengths<br /> - On the positive side, the authors provide clear evidence for the value of the direct buffer extraction system they used as well as confirming the utility of ddPCR for quantification. In addition, the successful application of MinION technology to sequence the EF-Tu amplicons from clinical samples is of interest.

      - Moreover, the phenomenology of the infection studies indicating greater DNA than CFU persistence and differences between the strains and the different MOI inoculations are interesting and well-described, although I have concerns regarding interpretation.

    2. eLife assessment

      This useful study addresses discrepancies in determining bacterial burden in osteomyelitis as determined by culture and enumeration using DNA. The authors present compelling data demonstrating the emergence of discrepancies between CFU counts and genome copy numbers detected by PCR in Staphylococcus aureus strains infecting osteocyte-like cells. Whilst the observations may represent a substantial addition to the field of musculoskeletal infection, the broad applicability and clinical benefit are unclear.

    3. Reviewer #1 (Public Review):

      Summary:<br /> This work shows, based on basic laboratory investigations of in-vitro-grown bacteria as well as human bone samples, that conventional bacterial culture can substantially underrepresent the quantity of bacteria in infected tissues. This has often been mentioned in the literature, however, relatively limited data has been provided to date. This manuscript compares culture to a digital droplet PCR approach, which consistently showed greater levels of bacteria across the experiments (and for two different strains).

      Strengths:<br /> Consistency of findings across in vitro experiments and clinical biopsies. There are real-world clinical implications for the findings of this study.

      Weaknesses:<br /> No major weaknesses. Only three human samples were analyzed, although the results are compelling.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      • A summary of what the authors were trying to achieve.

      The authors cultured pre- and Post-vaccine PBMCs with overlapping peptides encoding S protein in the presence of IL-2, IL-7, and IL-15 for 10 days, and extensively analyzed the T cells expanded during the culture; by including scRNAseq, scTCRseq, and examination of reporter cell lines expressing the dominant TCRs. They were able to identify 78 S epitopes with HLA restrictions (by itself represents a major achievement) together with their subset, based on their transcriptional profiling. By comparing T cell clonotypes between pre- and post-vaccination samples, they showed that a majority of pre-existing S-reactive CD4+ T cell clones did not expand by vaccinations. Thus, the authors concluded that highly-responding S-reactive T cells were established by vaccination from rare clonotypes.

      • An account of the major strengths and weaknesses of the methods and results.

      Strengths

      • Selection of 4 "Ab sustainers" and 4 "Ab decliners" from 43 subjects who received two shots of mRNA vaccinations.

      • Identification of S epitopes of T cells together with their transcriptional profiling. This allowed the authors to compare the dominant subsets between sustainers and decliners.

      Weaknesses

      • Fig. 3 provides the epitopes, and the type of T cells, yet the composition of subsets per subject was not provided. It is possible that only one subject out of 4 sustainers expressed many Tfh clonotypes and explained the majority of Tfh clonotypes in the sustainer group. To exclude this possibility, the data on the composition of the T cell subset per subject (all 8 subjects) should be provided.

      In accordance with the reviewer’s suggestion, we provided the composition of the T cell subset per subject (all 8 subjects) in the revised manuscript (shown below).

      Author response image 1.

      • S-specific T cells were obtained after a 10-day culture with peptides in the presence of multiple cytokines. This strategy tends to increase a background unrelated to S protein. Another shortcoming of this strategy is the selection of only T cells amenable to cell proliferation. This strategy will miss anergic or less-responsive T cells and thus create a bias in the assessment of S-reactive T cell subsets. This limitation should be described in the Discussion.

      We thank the reviewer for raising the question related to our experimental strategy. We chose this method because a background unrelated to S protein was lower than widely used AIM methods, which is verified by reconstituting many TCRs and testing the responses in vitro. One more reason is this method can identify S-reactive functional (proliferative) T cell clonotypes than anergic or less-responsive T cells as the reviewer mentioned, which is our objective in this study. In accordance with the reviewer’s suggestion, we have carefully described our limitation and rationale of our experimental strategy in the revised manuscript.

      • Fig. 5 shows the epitopes and the type of T cells present at baseline. Do they react to HCoV-derived peptides? I guess not, as it is not clearly described. If the authors have the data, it should be provided.

      As the reviewer mentioned, the pre-existing highly expanded clonotypes that we analyzed did not react to HCoV-derived peptides. After we determined the epitopes of the clonotypes, the S peptide sequences were analyzed for homology in HCoVs. The only two clonotypes whose epitope sequences were relatively conserved in HCoV strains (clonotypes #8-pre_9 and #8-pre_10) were tested for their reactivity to the similar HCoV epitope counterparts, but no activation was observed (shown below). We added these data in the revised manuscript.

      Author response image 2.

      • As the authors discussed (L172), pre-existing S-reactive T cells were of low affinity. The raw flow data, as shown in Fig. S3, for pre-existing T cells may help discuss this aspect.

      As the reviewer mentioned, some pre-existing S-reactive T cells might appear to react with S peptides judging from the NFAT-GFP expression of their reporter cell lines. However, the percentage of GFP-expressing cells is affected by many factors such as TCR expression level and HLA molecule expression level. Thus, the affinity of pre-existing S-reactive T cells was not fully deduced from the activation of reporter cell lines as shown in Fig. S3 in the present manuscript. We thank the reviewer for this constructive suggestion, but we therefore decided not to use these data quantitatively to evaluate affinity in this manuscript.

      Reviewer #2 (Public Review):

      Summary:

      A short-term comparison of durability of S antibody levels after 2-dose vaccination, showing that better or more poorly sustained responses correlate with the presence of Tfh cells.

      Strengths:

      Novelty of approach in expanding, sequencing and expressing TCRs for functional studies from the implicated populations.

      Weaknesses:

      Somewhat outdated question, short timeline, small numbers, over-interpretation of sequence homology data

      Reviewer #2 (Recommendations For The Authors):

      In line with my above comments, it might be useful for the authors to look at moderating some of the assertions in what is a rather small-scale descriptive account of correlates of some quite nuanced, short-term, S antibody response differences

      We clearly described that some homologous microbe-derived peptides were indeed recognized by S-reactive T cells. Also, we have removed our overstatement from the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The paper aims to investigate the relationship between anti-S protein antibody titers with the phenotypes&clonotypes of S-protein-specific T cells, in people who receive SARS-CoV2 mRNA vaccines. To do this, the paper recruited a cohort of Covid-19 naive individuals who received the SARS-CoV2 mRNA vaccines and collected sera and PBMCs samples at different timepoints. Then they mainly generate three sets of data: 1). Anti-S protein antibody titers on all timepoints. 2) Single-cell RNAseq/TCRseq dataset for divided T cells after stimulation by S-protein for 10 days. 3) Corresponding epitopes for each expanded TCR clones. After analyzing these results, the paper reports two major findings & claims: A) Individuals having sustained anti-S protein antibody response also have more so-called Tfh cells in their single-cell dataset, which suggests Tfh-polarization of S-specific T cells can be a marker to predict the longevity of anti-S antibody. B). S-reactive T cells do exist before the vaccination, but they seem to be unable to respond to Covid-19 vaccination properly.

      The paper's strength is it uses a very systemic and thorough strategy trying to dissect the relationship between antibody titers, T cell phenotypes, TCR clonotypes and corresponding epitopes, and indeed it reports several interesting findings about the relationship of Tfh/sustained antibody and about the S-reactive clones that exist before the vaccination. However, the main weakness is these interesting claims are not sufficiently supported by the evidence presented in this paper. I have the following major concerns:

      (1) The biggest claim of the paper, which is the acquisition of S-specific Tfh clonotypes is associated with the longevity of anti-S antibodies, should be based on proper statistical analysis rather than just a UMAP as in Fig2 C, E, F. The paper only shows the pooled result, but it looks like most of the so-called Tfh cells come from a single donor #27. If separating each of the 4 decliners and sustainers and presenting their Tfh% in total CD4+ T cells respectively, will it statistically have a significant difference between those decliners and sustainers? I want to emphasize that solid scientific conclusions need to be drawn based on proper sample size and statistical analysis.

      In accordance with the reviewer’s request, we have also analyzed the T cells separately (shown below). We observed the average frequency was much lower in decliners than sustainers, while the difference did not reach statistical significance partly because of the large deviation due to one sustainer (#27) who possessed quite a high Tfh%. We modified our description in the revised manuscript.

      Author response image 3.

      (2) The paper does not provide any information to justify its cell annotation as presented in Fig 2B, 4A. Moreover, in my opinion, it is strange to see that there are two clusters of cells sit on both the left and right side of UMAP in Fig2B but both are annotated as CD4 Tcm and Tem. Also Tfh and Treg belong to a same cluster in Fig 2B but they should have very distinct transcriptomes and should be separated nicely. Therefore I believe the paper can be more convincing if it can present more information and discussion about the basis for its cell annotation.

      We agree with the reviewer’s concern. Since antigen stimulation only induced the proliferation of antigen-specific T cells, the multiple clusters were mostly due to the fluctuation of cell cyclerelated genes. We therefore carefully and manually annotated these clusters by selecting the cell type-related genes (Kaech et al, Nat. Rev. Immunol., 2002; Sallusto et al, Annu Rev Immunol., 2004) and determined their subsets regardless of the automatic clustering based on the whole transcriptome. Indeed, antigen-responded Tfh and Treg are close, as ICOS and PDCD1 are expressed. We mainly used IL21 and FOXP3 to distinguish the Tfh and Treg populations, respectively. We thank the reviewer for pointing out this important process that we carefully addressed. We added the description of annotation methods to the revised manuscript.

      (3) Line 103-104, the paper claims that the Tfh cluster likely comes from cTfh cells. However considering the cells have been cultured/stimulated for 10 days, cTfh cells might lose all Tfh features after such culture. To my best knowledge there is no literature to support the notion that cTfh cells after stimulated in vitro for 10 days (also in the presence of IL2, IL7 and IL15), can still retain a Tfh phenotype after 10 days. It is possible that what actually happens is, instead of having more S-specific cTfh cells before the cell culture, the sustainers' PBMC can create an environment that favors the Tfh cell differentiation (such as express more pro-Tfh cytokines/co-stimulations). Thus after 10-days culture, there are more Tfh-like cells detected in the sustainers. The paper may need to include more evidence to support cTfh cells can retain Tfh features after 10-days' culture.

      We thank the reviewer for raising this important issue. As the reviewer pointed out, culturing T cells for 10 days indeed changed the repertoire and features, so the Tfh clonotypes we detected after the expansion may not correspond to the cTfh clonotypes in vivo. Because our observation and analysis were mostly based on the dominant T cell clonotypes expanded in vitro, we modified our description and conclusion accordingly in the revised manuscript.

      (4) It is in my opinion inaccurate to use cell number in Fig4B to determine whether such clone expands or not, given that the cell number can be affected by many factors like the input number, the stimulation quality and the PBMC sample quality. A more proper analysis should be considered by calculating the relative abundance of each TCR clone in total CD4 T cells in each timepoint.

      We thank the reviewer for pointing out our inaccuracy. As the reviewer suggested, we used percentages to demonstrate the relative abundance of each clonotype in Fig. 4B of the revised manuscript.

      (5) It is well-appreciated to express each TCR in cell line and to determine the epitopes. However, the author needs to make very sure that this analysis is performed correctly because a large body of conclusions of the paper are based on such epitope analysis. However, I notice something strange (maybe I am wrong) but for example, Table 4 donor #8 clonotype post_6 and _7, these two clonotypes have exactly the same TRAV5 and TRAJ5 usage. Because alpha chain don't have a D region, in theory these clonotypes, if have the same VJ usage, they should have the same alpha chain CDR3 sequences, however, in the table they have very different CDR3α aa sequences. I wish the author could double check their analysis and I apologize in advance if I raise such questions based on wrong knowledge.

      We thank the reviewer for carefully reading our manuscript. Although the two clonotypes, donor #8 clonotype post_6 and _7, have the exactly same TRAV5 and TRAJ5 usage, they have different CDR3a aa sequences due to random nucleotide addition in the rearrangement. Likewise, donor #27 clonotype post_1 and donor #13 clonotype post_15 had the same TRAV9-2 and TRAJ17 usage but different CDR3a.

      Reviewer #3 (Recommendations For The Authors):

      (1) Related to my public review 1. To make a solid conclusion, I think the author can include more sustainers and decliners if possible, can just stimulate their PBMCs for 10 days and check the Tfh features in proliferated CD4 T cells (e.g. IL21 secretion, PD-1 expression etc). And then compare these values in sustainers vs decliners

      We thank the reviewer for the suggestion. Unfortunately, additional PBMCs from more sustainers and decliners are not available to us. Instead, we carefully described the current observation in the revised manuscript.

      (2) Related to my public review 3. The author can attempt to sort CXCR5+ cTfh and CXCR5- non cTfh, stimulate in vitro for 10 days and compare whether the stimulated cTfh still have more Tfh-related features such as increased IL- 21 secretion.

      As the reviewer recommended, sorting and culturing the cTfh and non cTfh separately will clarify this issue. Due to the limitation of the samples, we could not perform these experiments.

      (3) I couldn't find information about the availability of data and code to analyze the single cell RNA-seq dataset in the manuscript

      We clarified the availability of data and added the codes for the single cell RNA-seq dataset in the revised manuscript.

    2. eLife assessment

      This study presents an important finding on the key factors of T cell responses associated with durable antibody responses following COVID-19 mRNA vaccinations. Though the sample size is small, and in-vitro stimulated T cells were used, the analysis and approaches were extensive, and the collected data were mostly solid. The results may greatly impact future COVID-19 vaccine design.

    3. Reviewer #1 (Public Review):

      • A summary of what the authors were trying to achieve.

      The authors cultured pre- and Post-vaccine PBMCs with overlapping peptides encoding S protein in the presence of IL-2, IL-7, and IL-15 for 10 days, and extensively analyzed the T cells expanded during the culture; by including scRNAseq, scTCRseq, and examination of reporter cell lines expressing the dominant TCRs. They were able to identify 78 S epitopes with HLA restrictions (by itself represents a major achievement) together with their subset, based on their transcriptional profiling. By comparing T cell clonotypes between pre- and post-vaccination samples, they showed that a majority of pre-existing S-reactive CD4+ T cell clones did not expand by vaccinations. Thus, the authors concluded that highly-responding S-reactive T cells were established by vaccination from rare clonotypes.

      • An account of the major strengths and weaknesses of the methods and results.

      Strengths:

      • Selection of 4 "Ab sustainers" and 4 "Ab decliners" from 43 subjects who received two shots of mRNA vaccinations.<br /> • Identification of S epitopes of T cells together with their transcriptional profiling. This allowed the authors to compare the dominant subsets between sustainers and decliners.

      Weaknesses were properly addressed in the revised manuscript, and I do not have any additional concerns.

    4. Reviewer #3 (Public Review):

      Summary: The paper aims to investigate the relationship between anti-S protein antibody titers with the phenotypes&clonotypes of S-protein-specific T cells, in people who receive SARS-CoV2 mRNA vaccines. To do this, the paper recruited a cohort of Covid-19 naive individuals that receives the SARS-CoV2 mRNA vaccines and collect sera and PBMCs samples on different timepoints. Then they mainly generate three sets of data: 1). Anti-S protein antibody titers on all timepoints. 2) Single-cell RNAseq/TCRseq dataset for divided T cells after stimulation by S-protein for 10 days. 3) Corresponding epitopes for each expanded TCR clones. After analyzing these result, the paper reports two major findings&claims: A) Individuals having sustained anti-S protein antibody response also have more so-called Tfh cells in their single-cell dataset. B). S-reactive T cells do exist before the vaccination, but they seems to be unable to response to Covid-19 vaccination properly.

      The paper's strength is it uses a very systemic and thorough strategy trying to dissect the relationship between antibody titers, T cell phenotypes, TCR clonotypes and corresponding epitopes, and indeed it reports several interesting findings about the relationship of Tfh clonotypes/sustained antibody and about the S-reactive clones that exist before the vaccination. The conclusion is solid in general but some claims are overstated. My suggestion is the authors should further limit their claims in abstract, for example,

      "Even before vaccination, S-reactive CD4+ T cell clonotypes did exist, most of which (MAY) cross-reacted with environmental or symbiotic bacteria" -- The paper don't have experimental evidence to show these TCR clones respond to these epitopes.

      "These results suggest that de novo acquisition of memory Tfh-like cells upon vaccination (LIKELY) contributes to the longevity of anti-S antibody titers." --Given the small sample size and the statistical analysis was not significant, this claim was overstated.

      "S-reactive T cell clonotypes detected immediately after 2nd vaccination polarized to follicular helper T (Tfh)-like cells (UNDER IN VITRO CULTURE)". -- the conclusion was based on vitro cultured cells, which had limitation.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Comment 1: The authors showed increased plasma IL-22 and its expression in the intestine. Are intestinal ILC3s the main source of plasma IL-22?

      Reply: ILC3s are the main source of IL-22 as reported previously (PMID: 30700914). In the small intestine, ILC3s account for about 62% of IL22+ cells. Other IL22+ cells include γδ T, Foxp3+T and CD4+T cells.

      Comment 2: The authors transplanted intestinal ILC3s from NCD mice to DIO mice and showed significant metabolic improvements. However, in Fig. 1, intermittent fasting increased IL-22positive ILC3s proportion rather than changing the total number. Please clarify whether this transplantation is due to increasing ILC3s number or introducing more IL-22 positive ILC3s (which are decreased in DIO). Are these transplanted ILC3s by default homing to the intestine rather than to other tissues?

      Reply: We believe that the transplantation increases ILC3s number, leading to the increment in IL22 levels. The transplanted ILC3s by default are homing to the intestine rather than to other tissues because ILC3s express several homing receptors such as CCR7, CCR9, and α4β7, which modulate their capacity to migrate to the gut (PMID: 26141583; PMID: 26708278; PMID: 25575242; PMID: 34625492). Our observation that ILC3s in adipose tissue remained unchanged by ILC3 cell transplantation (Supplementary Figure 5F) also supports this concept.

      Comment 3: Thermogenesis in this acute cold challenge is mainly by brown adipose tissue. Beiging is a chronic and adaptive response. Based on the data in WAT, there is a beiging phenotype, but the core body temperature in acute cold challenge is not an accurate readout. It would be a missed opportunity by not evaluating thermogenic activity in BAT. More browning genes should be included to strengthen the beiging phenotype of WAT. Moreover, inflammation in WAT can be examined to provide a whole picture of adipose tissue remodeling through this pathway.

      Reply: Per suggestion, we performed additional experiments to measure levels of inflammation genes such as Il4, Il1b, Il6, Il22, Il23, Il17a. As shown in supplemental figure 2D, these inflammation relevant genes were not altered.

      Comment 4: For the SVF beige adipocyte differentiation, 100 ng/mL IL-22 was used. This is highly above the physiological concentration at ~5 pg/mL. Please justify this high concentration used.

      Reply: We agree with the reviewer that the dose of IL-22 used is high. However, the efficient dose at 100 ng/ml used in our studies is consistent with the literatures. Previous reports have shown that IL-22 directly activates Stat3 in adipose tissue and primary adipocytes, and promotes the expression of genes involved in triglyceride lipolysis (Lipe and Pnpla2) and fatty-acid β-oxidation (Acox1) at the dose of 100 ng/ml (Wang X, Ota N, et al. Nature. 2014). Consistently, other studies have reported that IL-22 at 100 ng/ml significantly reversed the enhanced expression of CCL2, CCL20 and IL1B mRNAs in granulosa cells in vitro (Qi X, et al. Nat Med. 2019).

      Comment 5: The authors showed increased Ucp1 and Cidea expression by IL-22 treatment in SVFs. Please be aware that these increases are likely due to boosted adipogenesis as told by the morphology. Please examine more adipogenic markers to confirm. Is this higher adipogenesis caused by the high concentration of IL-22?

      Reply: Per suggestion, we examined the expression of adipogenic marker genes such as Pparγand Fabp4. We found that IL-22 did not increase the levels of these adipogenic marker genes relevant to the PBS control as shown in supplemental figure 6F.

      Author response image 1.

      Comment 6: In line 201, the authors drew the conclusion that IL-22 increased SVF beige differentiation. To fully support this conclusion, the authors should assure adipogenesis at the same baseline and then compare beiging, or examine the effect of IL-22 on normal adipogenesis to compare with beige differentiation.

      Reply: We examined the expression of adipogenic marker genes such as Pparγ and Fabp4 and found that IL-22 did not increase the expression of these adipogenic marker genes relevant to the PBS control.

      Reviewer #2:

      This study aims to investigate the mediatory role of intestinal ILC3-derived IL-22 in intermittent fasting-elicited metabolic benefits.

      Strengths:

      The observation of induction of IL-22 production by intestinal ILC3 is significant, and the scRNAseq provides new information into intestine-resident immune cell profiling in response to repeated fasting and refeeding.

      Weaknesses:

      The experimental design for some studies needs to be improved to enhance the rigor of the overall study. There is a lack of direct evidence showing that the metabolically beneficial effects of IF are mediated by intestinal ILC3 and their derived IL-22. The mechanism by which IL-22 induces a thermogenic program is unknown. The browning effect induced by IF may involve constitutive activation of lipolysis, which was not considered.

      Comment 1: Lack of direct evidence showing that IL-22-expressing ILC3s in intestine is the key contributor to intermittent fasting (IF)-mediated elevation of circulating IL-22 levels. The fraction of IL-22-expressing cells was increased threefold by IF but the increase in circulating IL-22 is moderate (Figs. 1J and 1K).

      Reply: IL-22 in circulation is subjected to clearance, degradation, and binding with plasma proteins, et al. Thus, circulating levels of IL-22 may be much lower than the amount secreted by the intestinal IL-22 positive ILC3s.

      Comment 2: The loss of fat mass by IF suggests that the active lipolysis may explain the white fat browning which was not considered. This may apply to the observations in IL-22 treated mice as well as IL-22R KO mice.

      Reply: We analyzed the expression of genes relate to lipolysis in NCD and NCD-IF mice and found that IF did not alter the levels of these genes in white adipose tissues (Supplementary figure 2D). We have addressed this concerns in lines 119, page 6.

      Author response image 2.

      Comment 3: IL-22 administration and adoptive transfer of ILC3 had no significant effect on body weight. Not clear how IL-22 improves insulin sensitivity in this case.

      Reply: Our results are consistent with previous report showing that IL-22 administration improves insulin sensitivity without change in body weight (Qi X, et al. Nat Med. 2019). In addition, previous studies have demonstrated that IL-22 can increase Akt phosphorylation in muscle, liver and adipose tissues, leading to improvement in insulin sensitivity (Wang X, et al. Nature. 2014). We have addressed this potential mechanism in lines192-195, page 9.

      Comment 4: The energy expenditure data look unusual given that there was little increase in oxygen consumption during dark cycle compared to light cycle (Fig.3).

      Reply: The not so obvious difference in oxygen consumption between dark cycle and light cycle may be due to the technical problem of the system.

      Comment 5: The thermogenic capacity for the whole fat pad needs to consider the expression of UCP1 in certain amount of tissue and the total mass for each individual animal because the mRNA level itself does not reflect the whole tissue capacity.

      Reply: We used the whole subcutaneous adipose tissue from one side for qPCR to reflect the whole tissue capacity.

      Comment 6: The design of studies for the adoptive transfer of ILC3 was concerned. The PBS is not a good control for the group with ILC3 cells (Figs. 2A-2H). Similar issue applies for the co-culture study in which beige only is not an ideal control for Beige+ILC3 (Figs. 2I-2J).

      Reply: We agree with the reviewer that the PBS is not a good control. Because we cannot find a similar immune cell without any effect on adipocytes, we designed this experiment based on other studies in which saline or PBS are used as ILC transfer experiment controls (Sasaki T, et al. Cell Rep. 2019; Wang H, et al. Nat Commun. 2019)

      Comment 7: The induction of thermogenesis by IL-22 treatment may be related to enhanced differentiation rather than direct activation of thermogenic genes (Figs. 4G and 4H).

      Reply: Our observation that IL-22 did not alter the levels of genes related to adipogenesis (Supplemental figure 6F) indicates that IL-22 may not alter the differentiation of adipocytes. We addressed this concern in Lines 211-212, page 10.

      Reviewer #3:

      Chen et al. investigated how intermittent fasting causes metabolic benefits in obese mice and found that intestinal ILC3 and IL-22-IL-22R signaling contribute to the beiging of white adipose tissue (WAT) and consequent metabolic benefits including improved glucose and lipid metabolism in diet-induced obese mice. They demonstrate that intermittent fasting causes increased IL22+ILC3 in small intestines of mice. Adoptive transfer of purified intestinal ILC3 or administration of exogenous IL-22 can lead to increases in UCP1 gene expression and energy expenditure as well as improved glucose metabolism. Importantly, the above metabolic benefits caused by intermittent fasting are abolished in IL-22R-/- mice. Using an in vitro experiment, the authors show that ILC3derived IL-22 may directly act on adipocytes to promote SVF beige differentiation. Finally, by performing sc-RNA-seq analysis of intestinal immune cells from mice with different treatments, the authors indicate a possible way of intestinal ILC3 being activated by intermittent fasting. Overall, this study provides a new mechanistic explanation for the metabolic benefits of intermittent fasting and reveals the role of intestinal ILC3 in the enhancement of the whole-body energy expenditure and glucose metabolism likely via IL-22-induced beige adipogenesis.

      Although this study presents some interesting findings, particularly IL-22 derived from intestinal ILC3 could induce beiging of WAT by directly acting on adipocytes, the experimental data are not sufficient to support the key claims in the manuscript.

      Comment 1: Only increased UCP1 expression on mRNA level is not enough to support the beiging of WAT. More methods such as western blotting and immunostaining of UCP1 in WAT are needed to confirm the enhanced beige adipogenesis.

      Reply: Additional experiments have been performed to measure the UCP1 protein by Western blot. The data is included in Figure 4I and Supplementary Figure 2E.

      Comment 2: IL-22 is known to modulate metabolic pathways via multiple downstream functions. The use of whole-body knockout of IL-22R could not exclude the indirect effect on the promotion of beiging of WAT. Specific deletion of IL-22R in adipose tissues is therefore needed to confirm the direct effect of IL-22 on adipocytes which is suggested by the in vitro study.

      Reply: We agreed with the reviewer that specific deletion of IL-22R in adipose tissues is critical to confirm the direct effect of IL-22 on adipocytes. We will generate the AdioQ-IL-22R-/- mice to test this concept further in vivo.

      Comment 3: The authors failed to show the cellular distribution of IL-22R in adipose tissues. This is important because the mechanism that explains the increased beige adipogenesis could be different based on the expression of IL-22R in adipose progenitor cells or mature adipocytes. So it is not appropriate to conclude that "IL-22 then directly activates IL-22R on adipocytes, leading to subsequent induction of beiging of white adipose tissue" in line 407. Additionally, Oil red O staining is needed for Fig 4G and Fig 5J, and protein levels of UCP1 and adipogenesis-related markers are needed to evaluate beige fat differentiation and the whole adipogenesis.

      Reply: Per suggestion, we have added the expression of IL-22R in adipose progenitor cells or mature adipocytes (Supplementary Figure 6E). In addition, protein levels of UCP1 and adipogenesis-related markers to evaluate the whole adipogenesis (Figure 4I, Supplementary figure 6F) are now included. We have also addressed this issue in lines 207-215, page 10.

      Comment 4: Although the authors provided some hypothesis about how intermittent fasting increases IL-22+ILC3 in small intestines by sc-RNA-seq analysis, some functional assays are needed to identify the factors, for example, how about the levels of macrophage-derived IL-23 or AHR ligands in small intestines and whether they contribute to increased percentages of intestinal IL-22+ILC3 following intermittent fasting.

      Reply: We used flow cytometry sorting of macrophages combined with qPCR experiments to preliminarily demonstrate that intermittent fasting increases the expression of molecules such as Cd44 and CCl4 (Supplementary Figure 10B), which may contribute to the increase in the proportion of IL-22+ ILC3s in the intestine under intermittent fasting. Our observation that IL-23 mRNA levels were not changed indicates that this molecule may not the major contributor for the communication between macrophage and ILC3s. Other potential molecules such as AHR ligands remain to be explored.

      Comment 5: What are the differences between adipose ILC3 and intestinal ILC3? Why do transferred ILC3 only migrate to the small intestine but not WAT of recipient mice? It would be better to examine or at least discuss whether other factors from intestinal ILC3 may also contribute to beiging of WAT following intermittent fasting.

      Reply: Intestinal ILC3s specifically express gut homing receptors CCR7, CCR9, and α4β7 (PMID: 26141583; PMID: 26708278; PMID: 25575242; PMID: 34625492). This may explain transplantation of intestinal ILC3s can migrate mainly to the intestine instead of adipose tissue (PMID: 34625492). The proportion of ILC3s in adipose tissue of mice is very small. Their functions have not been clarified yet. We have addressed this issue in lines 156-158, page 8.

      There are some other factors from intestinal ILC3 which may also contribute to beiging of WAT following intermittent fasting. By secreting IL-22, ILC3 enhanced the intestinal mucosal barrier, leading to reduction of the influx of LPS and PGN into the bloodstream under high-fat diet conditions, and subsequent increase in the beiging of white adipose tissue (Chen H, et al. Acta Pharm Sin B. 2022). We have addressed this potential mechanism in lines 344-347, page 16.

      Comment 6: The sensitivity of the IL-22 ELISA kit used in the manuscript was 8.2 pg/mL, according to the information from the methods, however, in Fig. 1J and Fig. 2B, the IL-22 levels in mouse plasma were lower than 6 pg/mL, which was below the sensitivity of the ELISA kit and also the assay range. Please explain.

      Reply: We have double-checked the original data and found that we have made a mistake in calculating the concentration of IL-22. We have corrected this error (Fig. 1J, Fig. 2B).

      Comment 7: In Fig 7A, the significance of the Hypothesis testing should be marked. In Fig 7F and 7G, the contrast between the two groups is not apparent, other comparing ways could be used to enhance the readability.

      Reply: Per suggestion, we have marked the significance of the hypothesis testing between HFD vs NCD and HFD-IF vs HFD in Fig7A. Shown in Fig 7F and 7G are the top 20 enriched interacting proteins between different cell types. The dot plot displays the average expression level and significance of protein interactions in cell types.

      Comment 8: The total food intake of fasting mice fed with NCD or HFD was less than those without fasting, and the food intake rate the author showed in Fig S1 represents the value that was normalized to body weight. So the author should describe it precisely In line 114.

      Reply: We have revised the statement accordingly in line 114-115.

      Comment 9: Western blotting analysis has been described in methods, however, there is no corresponding experimental data in the result part.

      Reply: The Western blotting results are now included.

    2. Reviewer #3 (Public Review):

      Chen et al. investigated how intermittent fasting causes metabolic benefits in obese mice and find that intestinal ILC3 and IL-22-IL-22R signaling contribute to the beiging of white adipose tissue (WAT) and consequent metabolic benefits including improved glucose and lipid metabolism in diet-induced obese mice. They demonstrate that intermittent fasting causes increased IL22+ILC3 in small intestines of mice. Adoptive transfer of purified intestinal ILC3 or administration of exogenous IL-22 can lead to increases in UCP1 gene expression and energy expenditure as well as improved glucose metabolism. Importantly, the above metabolic benefits caused by intermittent fasting are abolished in IL-22R-/- mice. Using an in vitro experiment, the authors show that ILC3-derived IL-22 may directly act on adipocytes to promote SVF beige differentiation. Finally, by performing sc-RNA-seq analysis of intestinal immune cells from mice with different treatments, the authors indicate a possible way of intestinal ILC3 being activated by intermittent fasting. Overall, this study provides a new mechanistic explanation for the metabolic benefits of intermittent fasting and reveals the role of intestinal ILC3 in the enhancement of the whole-body energy expenditure and glucose metabolism likely via IL-22-induced beige adipogenesis.

      Although this study presents some interesting findings, particularly IL-22 derived from intestinal ILC3 could induce beiging of WAT by directly acting on adipocytes, the experimental data are not sufficient to support the key claims in the manuscript.

    3. eLife assessment

      This study provides valuable findings showing the production of IL-22 from intestinal ILC3 during intermittent fasting promotes beigeing of white adipose tissue. The authors provided solid data and mechanistic insight by which IL-22-derived from ILC3 directly induces beigeing.

    4. Reviewer #1 (Public Review):

      In the present study, the authors carefully evaluated the metabolic effects of intermittent fasting on normal chow and HFD fed mice and reported that intermittent fasting induces beiging of subcutaneous white adipose tissue. By employing complementary mouse models, the authors provided compelling evidence to support a mechanism through ILC3/IL-22/IL22R pathway. They further performed comprehensive single-cell sequencing analyses of intestinal immune cells from lean, obese, obese undergone intermittent fasting mice and revealed altered interactome in intestinal myeloid cells and ILC3s by intermittent fasting via activating AhR. Overall, this is a very interesting and timely study uncovering a novel connection between intestine and adipose tissue in the context of executing metabolic benefits of intermittent fasting.

      (1) The authors showed increased plasma IL-22 and its expression in intestine. Are intestinal ILC3s the main source of plasma IL-22?

      (2) The authors transplanted intestinal ILC3s from NCD mice to DIO mice and showed significant metabolic improvements. However, in Fig. 1, intermittent fasting increased IL-22-positive ILC3s proportion rather than changing the total number. Please clarify whether this transplantation is due to increasing ILC3s number or introducing more IL-22 positive ILC3s (which are decreased in DIO). Are these transplanted ILC3s by default homing to intestine rather than to other tissues?

      (3) The authors adopted cold challenge at 4 degree for 6 hours to assess beiging in subcutaneous WAT and showed difference in core temperature. However, thermogenesis in this acute cold challenge is mainly by brown adipose tissue. Beiging is a chronic and adaptive response. Based on the data in WAT, there is a beiging phenotype, but the core body temperature in acute cold challenge is not an accurate readout. It would be a missed opportunity by not evaluating thermogenic activity in BAT.<br /> More browning genes should be included to strengthen the beiging phenotype of WAT. Moreover, inflammation in WAT can be examined to provide a whole picture of adipose tissue remodeling through this pathway.

      (4) For the SVF beige adipocyte differentiation, 100 ng/mL IL-22 was used. This is highly above the physiological concentration at ~5 pg/mL. Please justify this high concentration used.

      The authors showed increased Ucp1 and Cidea expression by IL-22 treatment in SVFs. Please be aware that these increases are likely due to boosted adipogenesis as told by the morphology. Please examine more adipogenic markers to confirm. Is this higher adipogenesis caused by the high concentration of IL-22?<br /> In line 201, the authors drew the conclusion that IL-22 increased SVF beige differentiation. To fully support this conclusion, the authors should assure adipogenesis at the same baseline and then compare beiging, or examine the effect of IL-22 on normal adipogenesis to compare with beige differentiation.

    5. Reviewer #2 (Public Review):

      Summary:<br /> This study aims to investigate the mediatory role of intestinal ILC3-derived IL-22 in intermittent fasting-elicited metabolic benefits.

      Strengths:<br /> The observation of induction of IL-22 production by intestinal ILC3 is significant, and the scRNAseq provides new information into intestine-resident immune cell profiling in response to repeated fasting and refeeding.

      Weaknesses:<br /> The experimental design for some studies needs to be improved to enhance the rigor of overall study. There is a lack of direct evidence showing that the metabolically beneficial effects of IF are mediated by intestinal ILC3 and their derived IL-22. The mechanism by which IL-22 induces thermogenic program is unknown. The browning effect induced by IF may involve constitutive activation of lipolysis, which was not considered.

      Majority of weaknesses have been addressed in the revision. Based on the analysis of thermogenic genes in addition to Ucp1 (Fig. 4D and S6F), the alteration on thermogenesis induced by IL-22 is dependent on UCP1 but not other markers such as PGC1a, PPARg, and Cidea. The data need to be discussed in the Section of Discussion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We have made substantial revisions to the manuscript, incorporating new data, which led to a renumbering and relabeling of several figures: • Figure 3F now features a modified graph color.

      • Figure 4I introduces a new experiment.

      • What was previously labeled as Figure 4I-O is now Figure 4J-P.

      • Figure 5H presents another new experiment.

      • The earlier Figure 5H is now rebranded as Figure 5I.

      • A fresh experiment has been incorporated into Supplement Figure 1a.

      • The former Supplement Figure 1a is now Supplement Figure 1b.

      • Supplement Figure 2d describes an additional new experiment.

      • In accordance with the HUGO gene nomenclature committee (HGNC) recommendations, we've updated the names of genes/proteins in both figures and their accompanying legends.

      Reviewer #1 (Recommendations For The Authors):

      Comment #1. Standard practice would include multiple TNBC cell lines to test the author's hypotheses, but the authors rely only on one cell line in the entire paper, MDA-MB-231 cells. The authors do correlate their findings to patient data, but the inclusion of an additional TNBC cell line would strengthen their findings about the L-DOXR cells and help with the assessment as to how reproducible their original microfluidics system is.

      Response: Thank you for your valuable feedback. We recognize the importance of utilizing multiple TNBC cell lines for rigorous validation and reproducibility. There are several reports highlighting the generation of L-DOXR cells in other types of breast cancer cell lines, such as MCF-7 (Fei et al., 2015), and in other cancer types like the prostate cancer cell line PC-3. These studies utilized a microfluidic device with a concentration gradient of Doxorubicin. With this existing evidence, we are confident that a variety of cancer cell types have the potential to form L-DOXR cells in a doxorubicin gradient. The cited reports support our choice of the MDA-MB-231 cell line for our current study:

      “L-DOXR cells exhibit increased genomic content (4N+) as compared to WT cells. The presence of cells with increased nuclear size and increased genomic content has been demonstrated to be associated with poor clinical outcomes in several types of cancers (Alharbi et al., 2018; Amend et al., 2019; Fei et al., 2015; Imai et al., 1999; Liu et al., 2018; Lv et al., 2014; Mukherjee et al., 2022; O’connor et al., 2002; Saini et al., 2022; Trabzonlu et al., 2023). (Page 5, Line 24)”

      However, we acknowledge the validity of your point regarding the strengthening of our findings with the inclusion of additional TNBC cell lines. We are considering expanding our research in future studies to further validate our findings across multiple TNBC cell lines. Thank you for bringing this to our attention, and we hope our response adequately addresses your concerns.

      Comment #2. It would be helpful to comment on the frequency at which doxorubicin is used clinically to treat TNBC patients. The authors equate their resistance phenotype to all chemotherapies (in patient data and title) but only test doxorubicin. Does NUPR1 overexpression result in resistance to other chemotherapies?

      Response: Thank you for raising these pertinent questions. To address your first point regarding the clinical use of doxorubicin for TNBC patients: At the Samsung Medical Center, the typical chemotherapy regimen for TNBC patients involves administering Neo. AC (Doxorubicin 34 mg + Cyclophosphamide 840 mg per session) four times, followed by Adj. D (Docetaxel 25 mg + 80 mg per session) for another four sessions. This provides insight into the clinical relevance and frequency of Doxorubicin's use in treating TNBC.

      Regarding your second point about NUPR1 overexpression and its broader implications for chemotherapy resistance: Yes, NUPR1 overexpression has been documented to result in resistance to various chemotherapies. A study by Lei Jiang et al. in the Journal of Pharmacy and Pharmacology found that NUPR1 plays a role in YAP-mediated gastric cancer malignancy and drug resistance through the activation of AKT and p21 (Jiang et al., 2021, https://doi.org/10.1093/jpp/rgab010). Additionally, another study by Wang et al. in Cell Death and Disease observed that the transcriptional coregulator NUPR1 is linked to tamoxifen resistance in breast cancer cells (Wang et al., 2021, https://doi.org/10.1038/s41419-021-03442-z). In light of this, while our study primarily focused on doxorubicin, the role of NUPR1 in resistance spans across various chemotherapeutic agents, adding depth to our findings and their broader implications in cancer therapy.

      Comment #3. The authors knockdown NUPR1 in L-DOXR cells, but overexpression of NUPR1 in WT TNBC cells to see if this renders the WT cells more resistant would be an important experiment.

      Response: We appreciate the reviewer's suggestion, which indeed underscores an important aspect of our study. In response, we have incorporated additional experiments in the revised manuscript. Specifically, on page 7 (lines 7-8) and in Supplement Figure 2c, we present data from experiments where we overexpressed Nupr1 in WT-MDA-MB231 cells. Our findings revealed that overexpression of GST-Nupr1 not only attenuates Dox-induced cell death but also mildly enhances cell viability in WT cells even without DOX treatment. This implies that cells expressing Nupr1 exhibit resistance to the cytotoxic effects of DOX. We believe these new data further solidify our conclusions and address the valuable point you raised.

      Comment #4. The similar colors/symbols chosen for the different groups in the xenograft plots are hard to easily interpret without zooming in.

      Response: We modified the xenograft plots as you recommended in Figure 3F.

      Comment #5. There are some grammatical errors throughout the paper. Below is an example: In the opening of the Discussion "TNBC is the most aggressive subtype of breast cancer, and chemotherapy is a mainstay of treatment. However, chemoresistance is common and contributes to the long-term survival of TNBC patients" - this sentence makes it seem like chemoresistance makes TNBC patients survive longer. The following sentence "These cells demonstrated a large phenotype with increased genomic content." is abrupt and doesn't make sense. Consider carefully re-reading the manuscript for grammatical errors.

      Response: Thank you for highlighting the grammatical errors and providing specific <br /> examples. We deeply apologize for the oversight. In response to your feedback, we've carefully re-reviewed the manuscript and made the necessary corrections. Based on your example: We've revised the sentences to: “TNBC is the most aggressive subtype of breast cancer, with chemotherapy being a mainstay of treatment. However, the development of chemoresistance frequently occurs and poses significant challenges to the long-term survival prospects of TNBC patients.” “As for the cells in question, they exhibited an enlarged phenotype along with an increased genomic content.”

      We appreciate your meticulous review, and we have made an effort to address and rectify other such errors throughout the manuscript.

      Reviewer #2 (Recommendations for The Authors):

      I recommend the authors to address the following minor issues. Below are specific comments on the manuscript.

      Comments # 1. Thank you for the comment. In CDRA chip, DOXR cells and L-DOXR cells appeared in the mid-DOX region. What is the concentration of DOX in this region? Can the authors calculate the concentrations of DOX in high-, mid-, and low- regions (or ranges of concentrations)?

      Response: Instead of DOX, we used FITC dye to visualize the concentration gradient over the chip as below because DOX generate very low fluorescent light.

      Author response image 1.

      While our method provides an estimation rather than precise measurement due to the difference in molecular weight between FITC (389.38 g/mol) and DOX (579.98 g/mol), it is still possible to approximate the distribution of DOX concentrations across different regions. We utilize a formula where the ratio of the average fluorescence intensity of FITC for each specific region to the highest recorded fluorescence intensity is multiplied by the peak DOX concentration (1.5 μM). This approach gives us an estimated average concentration of DOX in each region, acknowledging that the diffusion characteristics of FITC and DOX may vary due to their differences in molecular weight. The following formula.

      With this formula we can calculate the concentration in each region. High region= 1.161 μM; Mid region = 0.554 μM; Low region = 0.098 μM

      Comment #2. Is there any other phenotypic difference between DOXR cells and L-DOXR cells besides their size?

      Response: "In addition to differences in cell size, L-DOXR cells exhibit several distinct phenotypic characteristics when compared to DOXR cells. These include variations in the cell cycle profile (as detailed in Fig. 2F-H), altered drug efflux capabilities (presented in Fig. 2I-J), and changes in nuclear morphology (illustrated in Fig. S3D). These phenotypic distinctions suggest that L-DOXR cells may have adapted unique mechanisms of resistance and survival, which are comprehensively depicted in the figures mentioned.

      Comment #3. Please add a description of abbreviations when the abbreviation is first used in the manuscript (e.g. NUPR1, HDAC11 etc.).

      Response: We corrected the mistake.

      Comment # 4. Figure 2B is the schematic of the chip, not the dimension of the chip. Please add the dimension of the chip to keep the figure caption as is or change the figure caption.

      Response: Thank you for the correction. We change the figure caption as Schematic of the chip.

      Reviewer #3 (Recommendations for The Authors):

      In this manuscript, Lim and colleagues use an innovative CDRA chip platform to derive and mechanistically elucidate the molecular wiring of doxorubicin-resistant (DOXR) MDA-MB-231 cells. Given their enlarged morphology and polyploidy, they termed these cells as Large-DOXR (L-DORX). Through comparative functional omics, they deduce the NUPR1/HDAC11 axis to be essential in imparting doxorubicin resistance and, consequently, genetic or pharmacologic inhibition of the NUPR1 to restore sensitivity to the drug. Although innovative, some deficiencies in the present manuscript slightly weaken the primary conclusions. A couple of critical issues are the use of a single cell line model (i.e., MDA-MB-231) for all the phenotypic and functional experiments and absolutely no mechanistic insights into how NUPR1 imparts resistance to doxorubicin. Some questions and comments are listed below for the authors' consideration and response:

      Major:

      Comment #1. The authors treated only the MDA-MB-231 cells with doxorubicin in the CDRA chip. Do other TNBC cell lines (namely, MDA-MB-436, HCC1187, or others) respond similarly to dox treatment, eventually yielding enlarged, aneuploid cells with the resistant phenotype? It is important to show that this phenotype is not confined to a single cell line, particularly given the numerous TNBC models that are commonly used.

      Response: Thank you for your insightful query regarding the generalizability of our findings across different TNBC cell lines. In this initial study, we focused exclusively on MDA-MB-231 cells due to their widespread use as a model for aggressive triple-negative breast cancer and the constraints of time and resources. While we cannot definitively claim that the observed phenotypic changes upon doxorubicin treatment will be identical in other TNBC cell lines such as MDA-MB-436 or HCC1187, we hypothesize that the underlying mechanisms of chemoresistance and cellular response could be similar across various TNBC models. This hypothesis is supported by literature indicating common pathways of drug resistance in TNBC. We believe that our findings lay the groundwork for future studies to explore the response of a broader range of TNBC cell lines to doxorubicin treatment. Such studies would greatly enhance our understanding of the cellular adaptations to chemotherapeutic agents in TNBC and help to validate the potential universal application of our findings.

      Comment #2: Do the L-DOXR cells permanently hold onto the enlarged and polyploid states upon prolonged culture in vitro? Does that change given the presence or withdrawal of the drug? In other words, is the physical state of the resistant cells reversible, or is it passed onto the progeny cells regardless of continued stress from the drug?

      Response: Thank you for your question about the stability of the phenotypic changes in L- DOXR cells. Our observations suggest that the enlarged and polyploid states in L-DOXR cells are not permanently fixed. When cultured in vitro over an extended period without the selective pressure of doxorubicin, we have noted that some cells may revert to a non- polyploid state. However, this reversion does not seem to be a stable change as subsequent generations can present with polyploidy again, even in the absence of the drug. This indicates a potential epigenetic or microenvironmental influence on the phenotypic state of these cells, suggesting a complex interplay between the drug-induced stress and the inherent cellular response mechanisms. Further investigation is needed to fully understand the dynamics of these phenotypic changes and whether they are heritable and/or reversible under different culture conditions.

      Comment #3: In Figures 2F-H, the authors perform DNA-staining-based FACS to estimate the ploidy of the cells. These estimations could be improved using 2D cell cycle analyses using EdU or BrdU co-treatment and staining. This would further allow a clear distinction between S-phase and G0/G1 and M-phase cells in the WT, DOXR, and L-DORX populations.

      Response: Thank you for the suggestion to enhance the accuracy of our ploidy estimations. We appreciate the advice to implement 2D cell cycle analyses using EdU or BrdU co-treatment and staining, as this could indeed provide a clearer distinction between the various phases of the cell cycle in our WT (wild-type), DOXR (doxorubicin-resistant), and L-DOXR (large doxorubicin-resistant) cell populations. Incorporating these thymidine analogs would allow us to label newly synthesized DNA and thereby accurately delineate cells in the synthesis phase from those in the G0/G1 and M phases. This approach will likely add depth to our understanding of the cell cycle dynamics and the mechanism behind the drug resistance phenotype. We will consider incorporating these techniques in our future experiments to validate and extend the findings reported in this study.

      Comment #4. In Figure 3H, the authors quantitate the number of enlarged cells detected in human specimens of TNBC or normal breast tissues. How were these cells detected simply using the H&E staining, particularly when assessing the genomic content? Were certain size and nuclear staining intensity thresholds used for these categorizations? If so, these should be mentioned in the paper.

      Response: In our study, we identified enlarged cells within human TNBC and normal breast tissue specimens using H&E staining, and their quantitation was carried out using the Colour Deconvolution 2 plugin (Landini G et al., 2020) within the ImageJ software. This method allowed us to analyze the staining intensity and cell size systematically. To ascertain that we were indeed observing cells with increased genomic content, we established specific size and nuclear staining intensity thresholds. Cells exceeding these predetermined thresholds were categorized as 'enlarged'. Additionally, we used continuous serial slides for the human TNBC tissues microarray (BR1301, US Biomax) for more accurate comparisons in Figures 3H, I, and 5H. To strengthen our findings, we verified that NUPR1 expression, which is associated with the observed cell enlargements, was indeed elevated in these same cells from the patient samples. We have detailed these methodological aspects and the criteria for cell categorization in the 'Tissue Microarray and Immunohistochemistry' section of our Materials and Methods to ensure clarity and reproducibility of our results.

      Comment #5: In Figure 3I, the authors label the enlarged cells in the patient tissues as L-DOXR cells. Were these assessments done in dox-treated tumors? Even if that is the case, it'll be unfair to call them resistant to doxorubicin. The axis label "% enlarged cells" might be more accurate.

      Response: We appreciate the reviewer's attention to detail and agree that the terminology used in Figure 3I was inaccurate. The cells identified in patient tissues were labeled based on their morphological resemblance to L-DOXR cells observed in vitro; however, these patient tissue samples were not confirmed to be treated with doxorubicin, nor were the cells confirmed to be resistant. Therefore, we have amended the figure legend to reflect this and now refer to these cells simply as 'enlarged cells’.

      Comment #6: The authors uncovered that NUPR1 expression is dramatically increased in the L-DOXR cells vs the wild-type cells. How does the NUPR1 gene expression and activity compare between L-DOXR and DOXR MDA-MB-231 cells?

      Response: Thank you for the valuable comment. The data are included in figure supplement 3 and we revise the manuscript as below. “While DOXR cells exhibited a marked increase in Nupr1 expression compared to the WT cells, this expression was substantially less than that observed in L-DOXR cells, as detailed in figure supplement 3.”(Page 7, Line 3).

      Comment #7: Following from above, the authors show that NUPR1 activity is not necessary for cell survival in the absence of doxorubicin (Fig. 4H). But, does it control the cellular size and polyploid states of the L-DOXR cells? In other words, is there any association between increased size and genomic content of the cells to their sensitivity to doxorubicin? Are cells resistant to other chemotherapeutics as well? Or is the resistant phenotype specific to doxorubicin? The authors causally implicate NUPR1 in driving the dox-resistant phenotype in MDA-MB-231 cells. To fully substantiate this claim, the authors should perform gain-of-function studies, in at least 2-3 TNBC cell lines, to show that over-expression of NUPR1 alone is sufficient to impart doxorubicin resistance. Also, the most critical information missing from the study is how NUPR1 drives resistance to doxorubicin. What is the function of NUPR1 in L-DOXR cells and what gene expression program does it activate to impart the resistant phenotype?

      Response: During the experimental process either the loss of function or gain of function of Nupr1 in the L-DOXR cells, we have not noticed any specific changes in the cellular size and polyploid states of L-DOXR cells. Although we cannot rule out the possibility that not only by DOX treatment, phenotypically larger cell might arise in response to other chemotherapeutics, in the current study, we found that high level of Nupr1 expression is correlated with sensitivity to doxorubicin in L-DOX cells. Moreover, as followed by the reviewer’s suggestion we performed gain-of-function study to determine whether over-expression of NUPR1 alone is sufficient to impart doxorubicin resistance in TNBC cells. Overexpression of GST-NUPR1 attenuates DOX-induced cell death while slightly increased cell viability of WT (MDA-MB231) cells in the condition of vehicle -treatment, indicating that NUPR1 expressing cells are resistant to the cytotoxic effect of DOX. We have also demonstrated that Nupr1 upregulation in L-DOXR cells are due to suppressed expression of HDAC11 in these cells as we found that HDAC11 triggers promoter acetylation of Nupr1 in L-DOXR cells. Thus, it is conceivable that increased expression of Nupr1 upon HDAC11 suppression in L-DOXR cells is at least responsible for doxorubicin resistance.

      Comment #8: Do the authors speculate the dox-resistant phenotype to be restricted to basal TNBC tumors or even NUPR1-high ER+ breast cancer cells (MCF7 or T47D) would likely be resistant to doxorubicin or other chemotherapeutics?

      Response: Yes, NUPR1-high ER+ breast cancer cells (MCF7 or T47D) would likely be resistant to doxorubicin or other chemotherapeutics as reported elsewhere; Wang, L., Sun, J., Yin, Y. et al. Transcriptional coregualtor NUPR1 maintains tamoxifen resistance in breast cancer cells. Cell Death Dis 12, 149 (2021). https://doi.org/10.1038/s41419-021-03442-z

      Comment #9: The authors suggest that HDAC11 continuously deacetylates the NUPR1 promoter to suppress its expression. Consequently, does the inactivation of HDAC11 in wild-type TNBC cells lead to NUPR1 up-regulation? Is this increase in NUPR1 expression reverted upon inhibition of the HAT machinery (say P300/CBP) in HDAC11-deficient TNBC cells?

      Response: In the revised manuscript (pg 8, lines 14-16 and Fig 5H) consistent with our observation that while overexpression of HDAC11 suppresses the expression of Nupr1 in the both WT and L-DOXR cells, HDAC11 inhibitor treatment enhances Nupr1 expression in WT cells, inversely mirroring an unusual low expression of HDAC11 and high level of Nupr1 in L-DOXR cells. Conceivably, the increased Nupr1 expression reflects reverting of promoter acetylation.

      Minor:

      Comment #10: In Figure 4L, how many animals or tumors were in each of the treatment arms? Were the weights of all the tumors recorded as well? It would be meaningful to add this data, if available. The authors keep changing gene nomenclature throughout the manuscript, listing the gene names in either capital letters or the small-case. This can be made consistent.

      Response: We have used 6 mice per group and one tumor for one mouse due to the tumor <br /> size of L-DORX with the vehicle group. We also added new data showing the weights of the tumors in Figure supplement 2D. We apologize for the unmatched gene names. Following the reviewer’s suggestion, the names of genes/proteins have been changed in figures and legends to the recommendations of the HUGO gene nomenclature committee (HGNC).

    2. Reviewer #2 (Public Review):

      Summary:

      In this paper, the authors induced large doxorubicin-resistant (L-DOXR) cells by generating DOX gradients using their Cancer Drug Resistance Accelerator (CDRA) chip. The L-DOXR cells showed enhanced proliferation rates, migration capacity, and carcinogenesis. Then the authors identified that the chemoresistance of L-DOXR cells is caused by failed epigenetic control of NUPR1/HDAC11 axis.

      Strengths:

      - Chemoresistant cancer cells were generated using a novel technique and their oncogenic properties were clearly demonstrated using both in vivo and in vitro analysis.<br /> - The mechanisms of chemoresistance of the L-DOXR cells could be elucidated using in vivo chemoresistant xenograft models, an unbiased genome-wide transcriptome analysis, and a patient data/tissue analysis.<br /> - This technique has great capability to be used for understanding the chemoresistant mechanisms of tumor cells.

    3. eLife assessment

      This study based on the use of Cancer Drug Resistance Accelerator (CDRA) chip is valuable as a platform technology to assess chemoresistance mechanisms. The strength is convincing from the technological point of view. However, the use of a single cell line model is a limitation. However we acknowledge the authors' plan to further validate their current findings across multiple TNBC cell lines.

    4. Reviewer #1 (Public Review):

      Lim W et al. investigated the mechanisms underlying doxorubicin resistance in triple negative breast cancer cells (TNBC). They use a new multifluidic cell culture chamber to grow MB-231 TNBC cells in the presence of doxorubicin and identify a cell population of large, resistant MB-231 cells they term L-DOXR cells. These cells maintain resistance when grown as a xenograft model, and patient tissues also display evidence for having cells with large nuclei and extra genomic content. RNA-seq analysis comparing L-DOXR cells to WT MB-231 cells revealed upregulation of NUPR1. Inhibition or knockdown of NUPR1 resulted in increased sensitivity to doxorubicin. NUPR1 expression was determined to be regulated via HDAC11 via promoter acetylation. The data presented could be used as a platform to understand resistance mechanisms to a variety of cancer therapeutics.

    5. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Lim and colleagues use an innovative CDRA chip platform to derive and mechanistically elucidate the molecular wiring of doxorubicin-resistant (DOXR) MDA-MB-231 cells. Given their enlarged morphology and polyploidy, they termed these cells as Large-DOXR (L-DORX). Through comparative functional omics, they deduce the NUPR1/HDAC11 axis to be essential in imparting doxorubicin resistance and, consequently, genetic or pharmacologic inhibition of the NUPR1 to restore sensitivity to the drug.

      Strengths:

      The study focuses on a major clinical problem of the eventual onset of resistance to chemotherapeutics in patients with triple-negative breast cancer (TNBC). They use an innovative chip-based platform to establish as well as molecularly characterize TNBC cells showing resistance to doxorubicin and uncover NUPR1 as a novel targetable driver of the resistant phenotype.

      Weaknesses:

      Critical weaknesses are the use of a single cell line model (i.e., MDA-MB-231) for all the phenotypic and functional experiments and absolutely no mechanistic insights into how NUPR1 functionally imparts resistance to doxorubicin. It is imperative that the authors demonstrate the broader relevance of NUPR1 in driving dox resistance using independent disease models.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      (1) Please expand methods with additional details related to cell co-culture, such as cell numbers and duration.

      We thank the reviewer for the careful reading and constructive suggestions and we are sorry to make you confused. We have added the experimental details (manuscript line 551-553) related to co-culture in the revised manuscript.

      (2) Please unify the writing of the abbreviation of small extracellular vesicles in the text, figure, and caption.

      Thank you for your comments. We have unified the abbreviation of extracellular vesicles to sEVs in the revised manuscript.

      (3) The effects of components other than sEVs in mechanically stimulated osteocyte CM on the proliferation of NSCLC cells should be evaluated.

      We evaluated the effects of SF, lEVs and sEVs in osteocyte CM on NSCLC cell proliferation under mechanical stimulation, and found that sEVs had the most obvious inhibition on NSCLC cell proliferation, as shown in the revised Supplemental Figure 4c, d.

      (4) In addition to osteocytes and osteoblasts, the effects of other types of cells on the proliferation of NSCLC cells should be detected. It is recommended to add at least one type of cell from an infrequent metastatic site of NSCLC as a negative control.

      We thank the reviewer for the suggestion. We added NCM460 cell line (derived from intestinal epithelium) as a negative control and found that NCM460 had no significant effect on NSCLC cell proliferation, as shown in Figure 1d. These experiments were conducted before our last submission.

      (5) The bone microenvironment is complex. It is recommended to evaluate the effect of bone marrow-derived sEVs on NSCLC to validate whether the tumor suppressive effect of osteocyte sEVs is unique.

      We thank the reviewer for the suggestion. We agree with the reviewer’s comments that the bone microenvironment is complex. We explored the effect of bone marrow-derived sEVs on NSCLC cell proliferation and found that bone marrow-derived sEVs promoted NSCLC cell proliferation, as shown in Supplemental Figure 2g, h in the revised manuscript.

      (6) The description of exercise preconditioning is not clear enough. It is recommended to supplement the pattern diagram to improve readability. Exercise preconditioning should be further discussed by the Authors.

      Thank you for your comments and we are sorry to make you confused. We have added the pattern diagram of the exercise preconditioning in Supplemental Figure 6a.

      Reviewer #2 (Recommendations For The Authors):

      (1) The histological images are analyzed in a qualitative manner, with no description of the methodology used. A quantitative assessment of the distance and level of Ki-67+ NSCLC cells needs to be performed in human and murine tissues. Because in bone metastases cancer cells are frequently mixed with bone marrow cells, the inclusion of a cell marker to identify NSCLC cells is needed for proper interpretation of the imaging data.

      We thank the reviewer for the careful reading and constructive suggestions. We conducted the suggested quantitative assessment and descripted the methodology in the revised manuscript. The results showed that Ki-67 was lower in tumor cells adjacent to bone tissue than in the surrounding tumor cells (Figure 1a, b).

      In order to effectively identify NSCLC cells in bone metastases, GFP-expressing NSCLC cells were used in the animal model. We have added the immunofluorescence analysis of GFP and CCND3 in Supplemental Figure 4e, 4g, 5 and 6b.

      (2) The authors rely on KI-67 as a marker of proliferation. Yet, it is intriguing that some osteocytes, non-proliferating cells by definition, are often positive for this marker, which questions the specificity of the staining. The authors should provide the proper immunostaining controls to check for specificity and use additional markers of proliferation to confirm these results.

      We thank the reviewer for the suggestions. Ki-67 staining was wildly used to determine the dormancy of tumor cells in previous studies [1-4]. To confirm the results of Ki-67 staining, we used cyclin D3 (CCND3) as an additional marker of proliferation as suggested by the reviewer. We added the immunofluorescence analysis of CCND3 in Supplemental Figure 4e, 4g, 5 and 6b, which is consistent with the result of the quantitative immunofluorescence analysis of KI-67.

      (3) The lack of proper controls in the in vivo experiments makes the interpretation of the data difficult. For instance, in the preconditioning experiment, it is likely that the bone mass increases. thus, these mice start with high bone mass than the control mice. The lack of a proper control (naive mice exposed to moderate exercise) does not allow testing if the presence of cancer cells still promotes bone loss in this group. The authors need to include naive mice or analyze the bones from the non-injected contralateral legs.

      We thank the reviewer for the thoughtful comments and we are sorry to make you confused. We absolutely agree with the reviewer that the bone mass increases after exercise preconditioning. Multiple tissues and organ systems are affected by exercise, initiating diverse homeostatic responses. Although exercise preconditioning effectively suppressed bone metastasis progression of NSCLC as mentioned in the previous manuscript, we cannot immediately conclude that it is completely dependent on osteocytes to function. The mechanism of exercise preconditioning in suppressing bone metastasis progression is complex which still need further exploration. The revised manuscript has expanded the discussion on this area (manuscript line 326-328).

      (4)Further, validating the in vivo work with other osteocyte-like cells or primary osteocytes would have strengthened the results.

      We thank the reviewer for the suggestion. We have conducted the experiments of co-culture of MLO-A5 (another type of osteogenic cell line) and NSCLC cells as shown in Supplemental Figure 1g. Not surprisingly, MLO-A5 cells also had an inhibitory effect on proliferation of NSCLC cells.

      (5) The data on miRNA99b-3p on NSCLC in Supplementary Figure 3 is not convincing. The positive cells are difficult to see and most of the osteocyte lack nuclei. Better data, in humans and the mouse model, is needed to confirm that osteocytes produce miRNA99b-3p.

      We thank the reviewer for the comments and we are sorry to make you confused. In this study, we used miRCURY LNA miRNA detection probes in ISH without staining the nuclei in the tissues, which method have been used in our previous studies with others [5-7]. Detailed experimental procedures for ISH of miRNA have been added in the revised manuscript (manuscript line 461-474).

      (6) The authors do not provide a piece of data supporting that osteocytes are responsible for any of the effects seen by the interventions done in the in vivo models. Osteocytes, as well as other bone cells, can respond to mechanical stimulation and thus could virtually be responsible for the protective effects of mechanical loading or moderate exercise. In vivo experiments demonstrating a direct role of osteocytes-produced miRNA99b-3p are needed to support the notion that osteocytes maintain tumor dormancy in NSCLC bone metastasis.

      We thank the reviewer for the thoughtful comments and suggestion. We constructed in vivo model by injecting with antagomir-NC and antagomir-99b-3p with mechanical loading [8]. The results showed that the injection of antagomiR-99b-3p could partially and effectively rescue the inhibitory effect on NSCLC cell proliferation (Figure 4i-k).

      (7) Further, the authors solely rely on Ki-67 as a marker of dormancy. Completing this analysis with an assessment of a dormant gene expression signature or in vivo studies assessing tumor dormancy directly would be needed to confirm this notion.

      We thank the reviewer for the suggestion. We conducted the suggested experiment by using CCND3 as an additional dormancy marker. We added the immunofluorescence analysis of CCND3 in Supplemental Figure 4e, 4g, 5 and 6b, which is consistent with the result of the quantitative immunofluorescence analysis of Ki-67.

      References

      [1] Guba M, Cernaianu G, Koehl G et al. A primary tumor promotes dormancy of solitary tumor cells before inhibiting angiogenesis. Cancer Res, 2001, 61: 5575-9.

      [2] Bliss Sarah A, Sinha Garima, Sandiford Oleta A et al. Mesenchymal Stem Cell-Derived Exosomes Stimulate Cycling Quiescence and Early Breast Cancer Dormancy in Bone Marrow. Cancer Res, 2016, 76: 5832-5844.

      [3] Correia Ana Luísa, Guimaraes Joao C, Auf der Maur Priska et al. Hepatic stellate cells suppress NK cell-sustained breast cancer dormancy. Nature, 2021, 594: 566-571.

      [4] Hu Jing, Sánchez-Rivera Francisco J, Wang Zhenghan et al. STING inhibits the reactivation of dormant metastasis in lung adenocarcinoma. Nature, 2023, 616: 806-813.

      [5] Song Qiancheng, Xu Yuanfei, Yang Cuilan et al. miR-483-5p promotes invasion and metastasis of lung adenocarcinoma by targeting RhoGDI1 and ALCAM. Cancer Res, 2014, 74: 3031-42.

      [6] Carotenuto Pietro, Hedayat Somaieh, Fassan Matteo et al. Modulation of Biliary Cancer Chemo-Resistance Through MicroRNA-Mediated Rewiring of the Expansion of CD133+ Cells. Hepatology, 2020, 72: 982-996.

      [7] Lv Yan, Wang Yin, Song Yu et al. LncRNA PINK1-AS promotes Gαi1-driven gastric cancer tumorigenesis by sponging microRNA-200a. Oncogene, 2021, 40: 3826-3844.

      [8] Zhang Yun, Li Shuaijun, Jin Peisheng et al. Dual functions of microRNA-17 in maintaining cartilage homeostasis and protection against osteoarthritis. Nat Commun, 2022, 13: 2447.

    2. eLife assessment

      This is an important study, that adds to the field a new understanding of exercise or mechanical loading, microRNAs, and secreted extracellular vessicles in the field of lung cancer (NSCLC), which may have relevance to other osteolytic cancers. The strength of the evidence was mixed: whereas in vitro microRNA experiments were convincing, other elements were incomplete (e.g., proving the roles of osteocytes, as opposed to other mechanosensitive cells, in vivo). This work would be of broad interest to those investigating osteolytic cancers, and the role of exercise in bone cancer, preclinically.

    3. Reviewer #1 (Public Review):

      Xie and Colleagues propose here to investigate the mechanism by which exercise inhibits bone metastasis progression. The authors describe that osteocyte, sensing mechanical stimulation generated by exercise, inhibit NSCLC cell proliferation and sustain the dormancy thereof by releasing sEVs with tumor suppressor microRNAs. Furthermore, mechanical loading of the tibia inhibited the bone metastasis progression of NSCLC. Interestingly, exercise preconditioning effectively suppressed bone metastasis progression.

    4. Reviewer #2 (Public Review):

      In this manuscript, Xie and colleagues investigate the contribution of osteocytes to bone metastasis of non-small cell lung carcinoma (NSCLC) using a combination of clinical samples and in vitro and in vivo data. They find that metastatic NSCLC cells exhibit lower levels of the proliferation markers Ki-67 and CCND3 when located in areas adjacent to the bone surface in both NSCLC patients and an intraosseous animal model of NSCLC. Using in vitro approaches, they show that osteocyte-like cells inhibit the proliferation of NSCLC cells through the secretion of small extracellular vesicles (sEVs). They identify miR-99b-3p as a component of sEVs and demonstrate that miR-99b3p inhibits the proliferation of NSCLC cells by targeting the transcription factor MDM2. Interestingly, the data also shows that mechanical stimulation of osteocytes enhances the inhibitory effect of osteocytes on NSCLC cell proliferation via increasing sEVs release. By performing different in vivo studies, the authors show that tibial loading and moderate exercise (treadmill running), before and after tumor cell inoculation, suppress tumor progression in bone and protect bone mass. Intriguingly, the moderate exercise regime shows additive/synergistic effects with the co-administration of anti-resorptive therapy. These data add to the growing evidence pointing towards osteocytes as important cells of the tumor microenvironment capable of influencing the progression of tumors in bone.

      The conclusions of the paper, however, are not well supported by the data, and some critical aspects of image analysis and data analysis need to be clarified and extended.

      (1) In Figure 1, the authors rely on KI-67 as a marker of proliferation. Yet, it is intriguing that some osteocytes, non-proliferating cells by definition, are often positive for this marker, which questions the specificity of the staining. The data displayed in supplementary figures showing CCND3 as a marker of proliferation ,and GFP as a marker of cancer cells, is much more robust and should be moved to the main figures.

      (2) Adding control groups to fully assess the impact of the in vivo interventions (tibial loading, moderate exercise, anti-resorptive therapy) on bone mass would be needed. The authors should have used naive mice or analyzed the bones from the non-injected contralateral legs.

      (3) The data on miRNA99b-3p on NSCLC in Supplementary figure 3 is not convincing. The positive cells are difficult to see and most of the osteocyte lack nuclei. Better data, in humans and the mouse model, would have helped to confirm that osteocytes produce miRNA99b-3p.

      (4) Some conclusions of the paper are not entirely supported by the data provided. Osteocytes, as well as other bone cells, can respond to mechanical stimulation and thus could virtually be responsible for the protective effects of mechanical loading or moderate exercise. While blocking miR-99b3p with antagomiRs rescued the decreases in proliferation, it is unclear whether this effect is mediated by osteocytes or other cells that express this miRNA. In vivo experiments demonstrating a direct role of osteocytes are needed to support the notion that osteocytes maintain tumor dormancy in NSCLC bone metastasis. In vivo, studies assessing tumor dormancy directly would be needed to confirm osteocytes promote cancer cell dormancy.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      TRIP13/Pch2 is a conserved essential regulator of meiotic recombination from yeast to humans. In this manuscript, the authors generated TRIP13 null mice and Flag-tagged TRIP13 knock-in mice to study its role in meiosis. They demonstrate that TRIP13 regulates MORMA domain proteins and is essential for meiotic completion and fertility. The main impact of this manuscript is its clarification of the in vivo function of TRIP13 during mouse meiosis and its previously unrecognized role as a dose-sensitive regulator of meiosis.

      Strengths:

      Two previously reported Trip13 mutations in mice are both hypomorphic alleles with distinct phenotypes, precluding a conclusion on its function. This study for the first time generated the TRIP13 null mice, definitively revealing the function of TRIP13 in meiosis. The authors also show the novel localization of TRIP13 at SC and its independence from the axial element components. The finding of dose-sensitive regulation of meiosis by TRIP13 has implications in understanding human meiosis and disease phenotypes.

      Weaknesses:

      This manuscript would be more impactful if more mechanistic advancements could be made. For example, the authors could follow up with one of the new interactors identified by MS to offer new insight into the molecular function of TRIP13.

      We agree that it would be interesting to follow up on new candidate interactors but think that it would be more feasible to follow up on them in future studies.

      Reviewer #2 (Public Review):

      Summary and Strengths:

      In this manuscript, Chotiner and colleagues demonstrated the localization of TRIP13 and clarified the phenotypes of Trip13-null mice in mouse meiosis. The meiotic phenotypes of Trip13 have been well characterized using the hypomorph alleles in the literature. However, the null phenotypes have not been examined, and the localization of TRIP13 was not clearly demonstrated. The study fills these important knowledge gaps in the field. The demonstration of TRIP13 localization to SC in mice provides an explanation of how HOMRA domain proteins are evicted from SC in diverse organisms. This conclusion was confirmed in both IF and TRIP13-tagged Tg mice. Further, the phenotypes of Trip13-null mice are very clear. The manuscript is well crafted, and the discussion section is well organized and comprehends the topic in the field. All in all, the manuscript will provide important knowledge in the field of meiosis.

      Weaknesses:

      The heterozygous phenotypes demonstrate that TRIP13 is a dosage-sensitive regulator of meiosis. In relation to this conclusion, as summarized in the discussion section, other mutants defective in meiotic recombination showed dosage-sensitive phenotypes. However, the authors did not examine meiotic recombination in the Trip13-null mice.

      Meiotic recombination was extensively characterized in Trip13 severe hypomorph mutants in two previous studies: gamma-H2AX, BLM, BRCA1, ATR, RPA, RAD51, DMC1, MLH1 (Li and Schimenti, 2007; Roig et al., 2010). All the meiotic defects in our Trip13-null mice were also present in Trip13 severe hypermorph mutants: meiotic arrest, defects in chromosomal synapsis, asynapsis at chromosomal ends, and accumulation of HORMAD1/2 on the SC axis. Therefore, the defects in meiotic recombination in Trip13-null mice are expected to be similar to those in Trip13 severe hypermorph mutants and thus we did not examine the proteins involved in meiotic recombination in the Trip13-null mutant.

      Reviewer #3 (Public Review):

      Summary:

      The authors perform a thorough examination of the phenotypes of a newly generated Trip13 null allele in mice, noting defects in chromosome synapsis and impact on localization of other key proteins (namely HORMADs) on meiotic chromosomes. The vast majority of data confirms observations of several prior studies of Trip13 alleles (moderate and severe hypomorphs). The original or primary aims of the study aren't clear, but it can be assumed that the authors wanted to better study the role of this protein in evicting HORMADs upon synapsis by studying phenotypes of mutants and better characterizing TRIP13 localization data (which they find localizes to the central element of synapsed chromosomes using a new epitope-tagged allele). Their data confirm prior reports and are consistent with localization data of the orthologous Pch2 protein in many other organisms.

      Strengths:

      The quality of data is high. Probably the most important data the authors find is that TRIP13 is localized along the CE of synapsed chromosomes. However, this was not unexpected because PCH2 is also similarly localized. Also, the authors use a clear null (deletion allele), whereas prior studies used hypomorphs.

      Weaknesses:

      There is limited new data; most are confirmatory or expected (i.e., SC localization), and thus the impact of this report is not high. The claim that TRIP13 "functions as a dosage-sensitive regulator of meiosis" is exaggerated in my opinion. Indeed, the authors make the observation that hets have a phenotype, but numerous genes have haploinsufficient phenotypes. In my opinion, it is a leap to extrapolate this to infer that TRIP13 is a "regulator" of meiosis. What is the definition of a meiosis regulator? Is it at the apex of the meiosis process, or is it a crucial cog of any aspect of meiosis?

      TRIP13 is not haploinsufficient, as Trip13 heterozygotes were still viable and fertile (albeit with defects in meiosis). TRIP13 is an ATPase and changes the conformation of meiosis-specific proteins such as HORMAD proteins. TRIP13 is essential for meiosis and its mutations cause defects in both meiotic recombination and chromosomal synapsis. Reviewer 1 stated that “TRIP13/Pch2 is a conserved essential regulator of meiotic recombination from yeast to humans”. Therefore, we feel that TRIP13 can be called a regulator of meiosis.

      Reviewer #1 (Recommendations For The Authors):

      A schematic illustration of SC structure, the components involved, and the main finding, would be helpful for readers to better understand the advancement made by this study.

      We have now added a schematic illustration in a new panel - Figure 7C.

      Fig. 1B, the stage with diplotene cells should be XII.

      The pachytene cells (Pac) were mis-labelled as diplotene cells. Corrected.

      Fig. 1C, color mislabeled.

      Corrected.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript will provide important knowledge in the field of meiosis. I support the publication of this study. I have some suggestions to improve and polish the manuscript.

      Major points:

      (1) The heterozygous phenotypes demonstrate that TRIP13 is a dosage-sensitive regulator of meiosis. In relation to this conclusion, as summarized in the discussion section, other mutants defective in meiotic recombination showed dosage-sensitive phenotypes. Given the function of HORMAD1 in meiotic recombination, it would be informative if the authors could examine how major makers of meiotic recombination behave in Trip13-null meiosis.

      Please see our response to Weaknesses from Reviewer #2.

      (2) Relating to the above point, the complete lack of synapsis on the sex chromosomes in the Trip13-null meiosis is impressive. This result raises a question as to whether the pathway to designate XY-obligatory crossover (which can be detected with large foci of ANKRD31 and MEI4/REC114 at PAR) is affected or not. It would be interesting to examine whether the ANKRD31 and MEI4/REC114 foci are present on PAR in Trip13-null meiosis.

      We have performed immunofluorescent analysis of REC114 in spermatocytes. In Trip13-null pachytene-like spermatocytes, X and Y chromosomes are not synapsed. REC114 still formed one focus each on the unsynapsed X and Y chromosomes. We have added this new data in the Results as a new supplementary figure (Figure 4 -supplement 1).

      (3) Figure 4 can be improved if there are quantified data for each phenotype. These phenotypes look nearly complete, but it would be informative to show the penetrance of these phenotypes.

      Because some chromosomes have unsynapsed ends, resulting in two centromere or telomere foci, the total number of centromere or telomere foci is always higher in Trip13-null pachytene-like spermatocytes than wild type pachytene spermatocytes. Therefore, we did not count the foci of centromeres and telomeres. Consistently, the centromere and telomere markers localized as expected in both wild type and Trip13-null spermatocytes.

      (4) I am not fully convinced by these photos: "synapsed sister chromatids (Figure 6B)" and "Sycp2-/- spermatocytes formed short stretches of synapsis (Figure 6C)". The authors may try confocal microscopy with super-resolution deconvolution as they did for other data.

      These have been previously demonstrated. The “synapsed sister chromatids (Figure 6B)” were previously demonstrated by confocal microscopy with super-resolution deconvolution (Guan et al., 2020). The short stretches of synapsis in Sycp2-/- spermatocytes was previously demonstrated by electron microscopy (Tripartite SC structure) and SYCP1 immunofluorescence (Yang et al., 2006). We have revised the text by citing the previous evidence and the publications.

      Minor points:

      (1) Line 19-21: "Loss of TRIP13 leads to meiotic arrest and thus sterility in both sexes. Trip13-null meiocytes exhibit abnormal persistence of HORMAD1 and HOMRAD2 on synapsed SC". These findings confirm the previously reported phenotypes of the Trip13 hypomorph alleles. This information can be added to the abstract. Otherwise, it sounds like these are totally new findings, as written.

      This information is now added to the abstract: “These findings confirm the previously reported phenotypes of the Trip13 hypomorph alleles.”

      (2) The introduction section seems too long and contains unnecessary information. Some molecular details that are not touched in the result section can be deleted (e.g., Line 65-73).

      We would like to keep the molecular details on the two conformation states, as it provides biochemical background on TRIP13-HORMAD interactions.

      (3) Introduction, Line 92. A rationale can be added as to why the authors characterized the Trip13-null allele.

      a rationale has been added as follows: “To determine the effect of complete loss of TRIP13, we characterized Trip13-null mice.”

      (4) Line 205: Typo "TRRIP13". Corrected.

      Reviewer #3 (Recommendations For The Authors):

      Just a few recommendations:

      (1) In my opinion, the title is an overreach. "Regulator" invokes other concepts such as transcription factors.

      Please see our explanation in response to weaknesses from Reviewer #3.

      (2) The first sentence of the results deals with TRIP13 expression in only 3 tissues. The authors might look at more comprehensive RNA-seq data from mice and humans.

      We examined TRIP13 protein expression in 8 mouse tissues by WB and found that TRIP13 protein was abundant in testis but present at a very low level in ovary and liver (Figure 1A). We feel that readers can easily look up the relative transcript levels of Trip13 in more tissues from mice and humans from NCBI database under “Gene”.

      (3) The null allele is semi-lethal. Is body size affected? Were the mice abnormal in any other ways, given that TRIP13 has been implicated in other diseases and processes, and is expressed in other tissues (TRIP13 stands for Thyroid receptor interacting protein).

      The body weight of 2-3 month-old males was not significantly different between wild type (24.3±2.8 g, n=5) and Trip13 KO mice (22.8±1.7 g, n=5, p=0.3, Student’s t-Test). We have included the body weight information in the revised manuscript. We didn’t observe abnormal somatic defects in the viable Trip13-null mice, nor did the authors report any in the Trip13 hypomorph mutants in two previous studies (Li and Schimenti, 2007; Roig et al., 2010).

      (4) Line 276 : It would be nice to elaborate on the "spatial explanation."

      We meant that TRIP13 localizes to SC while HORMAD proteins are removed from SC upon chromosomal synapsis, thus providing a spatial explanation. However, we have now deleted “spatial”.

    2. eLife assessment

      This important study defined the physiological function of a conserved meiosis factor during murine spermatogenesis. The genetic and cellular biological evidence supporting the conclusion is convincing. This work will be of broad interest to cell biologists, geneticists, and reproductive biologists.

    3. Reviewer #1 (Public Review):

      Summary:<br /> TRIP13/Pch2 is a conserved essential regulator of meiotic recombination from yeast to humans. In this manuscript, the authors generated TRIP13 null mice and Flag-tagged TRIP13 knock-in mice to study its role in meiosis. They demonstrate that TRIP13 regulates MORMA domain proteins and is essential for meiotic completion and fertility. The main impact of this manuscript is its clarification of the in vivo function of TRIP13 during mouse meiosis and previously unrecognized role as a dose-sensitive regulator of meiosis.

      Strengths:<br /> Two previously reported Trip13 mutations in mice are both hypomorphic alleles with distinct phenotypes, precluding a conclusion on its function. This study for the first time generated the TRIP13 null mice, definitively revealed the function of TRIP13 in meiosis. The authors also show novel localization of TRIP13 at SC and its independence from the axial element components. The finding of dose-sensitive regulation of meiosis by TRIP13 has implication in understanding human meiosis and disease phenotypes.

      The results support the main conclusions and advance the understand of meiosis in the germline.

    4. Reviewer #2 (Public Review):

      Summary and Strengths:<br /> In this manuscript, Chotiner and colleagues demonstrated the localization of TRIP13 and clarified the phenotypes of Trip13-null mice in mouse meiosis. The meiotic phenotypes of Trip13 have been well characterized using the hypomorph alleles in the literature. However, the null phenotypes have not been examined, and the localization of TRIP13 was not clearly demonstrated. The study fills these important knowledge gaps in the field. The demonstration of TRIP13 localization to SC in mice provides an explanation of how HOMRA domain proteins are evicted from SC in diverse organisms. This conclusion was confirmed in both IF and TRIP13-tagged Tg mice. Further, the phenotypes of Trip13-null mice are very clear. The manuscript is well crafted, and the discussion section is well organized and comprehends the topic in the field. All in all, the manuscript will provide important knowledge in the field of meiosis.

      Weaknesses:<br /> The heterozygous phenotypes demonstrate that TRIP13 is a dosage-sensitive regulator of meiosis. In relation to this conclusion, as summarized in the discussion section, other mutants defective in meiotic recombination showed dosage-sensitive phenotypes. However, the authors did not examine meiotic recombination in the Trip13-null mice.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      However, there are several concerns to be explained more in this study. In addition, some results should be revised and updated.

      Thank you for your comments. The concerns were addressed by the description and experiment.

      Some results were revised and updated accordingly.

      Reviewer #2 (Public Review):

      The minor weakness of the study is inconsistent use of terminology throughout the manuscript, occasional logic-jump in their flow, and missing detailed description in methodologies used either in the text or Materials and Methods section, which can be easily rectified.

      Thank you for your review. We have revised the manuscript and corrected errors according to your comments.

      Reviewer #3 (Public Review):

      Importantly, besides the Miwi ubiquitination experiment which is performed in a heterologous and therefore may not be ideal for extracting conclusions, the possible involvement of ubiquitination was not shown for any other proteins that the authors found that interact with FBXO24. Could histones and transition proteins be targets of the proposed ubiquitin ligase activity of FBXO24, and in its absence, histone replacement is abrogated?

      Thank you for your comments. The histones and transition proteins were not found in the immunoprecipitates of FBXO24, suggesting they are not the direct targets of FBXO24, shown in Figure S3G.

      Miwi should be immunoprecipitated and Miwi ubiquitination should be detected (with WB or mass spec) in WT testis.

      We agree with this suggestion. In the revision, the expression and ubiquitination of MIWI were detected in WT testis by the immunoprecipitation and ubiquitination assay, as shown in Figure 8H.

      Therefore, the claim that FBXO24 is essential for piRNA biogenesis/production (lines 308, 314) is not appropriately supported.

      We appreciate the comment. We have revised the description and modified the claim on page 11.

      Reviewing Editor's note for revision

      (1) As noted by all three reviewers, as currently written the rationale to focus on MIWI is not entirely clear. A transitional narrative to focus on MIWI needs to be provided as well as an explanation for how the absence of FBXO24 as an E3 ubiquitin ligase is responsible for the observed mRNA and protein differential expression.

      We appreciate your comments. We have supplemented the transitional narrative by focusing on MIWI and explained mRNA and protein differential expression upon FBXO24 deletion, shown on Page 7 and Page 13, respectively.

      (2) As it can be indirect, mass spec detection of MIWI in testis co-IP and MIWI ubiquitination should be detected (with WB or mass spec) in WT testis.

      In the revision, the expression and ubiquitination of MIWI were detected in WT testis by the immunoprecipitation and ubiquitination assay, as shown in Figure 8H.

      (3) Please tone down the claim that FBXO24 is essential for piRNA biogenesis/production as it requires further evidence.

      We have revised the description and modified the claim on page 11.

      (4) Ontology analysis of the genes with abnormally spliced mRNAs to provide an explanation for developmental defects.

      In the revision, we have performed the ontology analysis and provided new data regarding the abnormally spliced genes, as shown in Figure S4D.

      Reviewer #1 (Recommendations For The Authors):

      Major comments

      (1) The authors performed mainly with the WT (or knock-in) and Fbxo24-knockout mouse model. Do the heterozygous males and their sperm have any physiological defects like FBXO24-deficient mice?

      This is a good question. We did the phenotype analysis and found that heterozygous males are all fertile, and their sperm do not have any physiological defects.

      (2) Fbxo24-KO sperm carries swollen mitochondria. How do the mitochondria affect sperm function?

      Thank you for raising this interesting question. Based on our data and published literature, the defective mitochondria were associated with energetic disturbances and reduced sperm motility, as shown on Page 12.

      (3) TEM images show that Fbxo24-KO spermatids carry swollen mitochondria and enlarged chromatoid bodies. How the swollen mitochondria and enlarged chromatid are defective for sperm motility and flagellar development, requires more explanation. In addition, it is unclear how the enlarged diameter of the chromatoid body is critical for normal sperm development.

      Thank you for your comments. The chromatoid bodies are considered to be engaged in mitochondrial sheath morphogenesis. Analysis of the chromatoid bodies' RNA content reveals enrichment of PIWI-interacting RNAs (piRNAs), further emphasizing the role of the chromatoid bodies in post-transcriptional regulation of spermatogenetic genes. We added this explanation on Page 12-13.

      (4) The authors only show band images to compare the protein amounts between WT and KO sperm and round spermatids. As the blots for loading controls are not clear, the authors should quantify the protein levels and perform a statistical comparison.

      We quantified the protein levels and performed a statistical comparison, as shown in Figure S3B.

      (5) The authors show the defective sperm head structure from Fbxo24-KO sperm in Figure 5. However, the Fbxo24-KO sperm heads seem quite normal in Figure 3. How many sperm show defective sperm head structure? In addition, the authors observed altered histone-to-protamine conversion in sperm, but it is unclear whether the altered nuclear protein conversion causes morphological defects in the sperm head.

      We appreciate the comments. In our study, we found over 80% of Fbxo24 KO sperm showed defective structure in the sperm head. Altered histone-to-protamine conversion caused the decondensed nucleus of Fbxo24 KO sperm. Notably, in many knockout mice studies, impaired chromatin condensation is frequently associated with abnormal sperm head morphology, as shown in reference 15 of Page 8.

      (6) The authors compare the protein levels of RNF8, PHF7, TSSK6, which participate in nuclear protein replacement in sperm. However, considering the sperm is the endpoint for the nuclear protein conversion, it is unclear to compare the protein levels in mature sperm. The authors might want to compare the protein levels in developing germ cells.

      Thank you for your comment. Yes, we actually detected the protein levels of RNF8, PHF7, and TSSK6 in the testes, not in sperm. We have corrected it in the Figure 5E. We apologize for our carelessness.

      (7)This reviewer suggests describing more rationales for how the authors focus on the MIWI protein. Also, it is wondered whether MIWI is also detected from testis co-IP mass spectrometry.

      We agree with this suggestion. Since MIWI was a core component of CB and also identified as an FBOX24 interacting partner from our immunoprecipitation-mass spectrometry (IP-MS) (Table S1), we focused on the examination of MIWI expression between WT and Fbxo24 KO testes. We have added this description in the revision (see lines 191-193 on page 7).

      (8) The authors need to provide a more detailed explanation for how the altered piRNA production affects physiological defects in germ cell development. In addition, it will be good to describe more how the piRNAs affect a broad range of mRNA levels.

      Thank you for your comments. The previously published studies have demonstrated that piRNAs could act as siRNAs to degrade specific mRNAs during male germ cell development and maturation. We have cited these studies on lines 369-372 of Page 13.

      (9) The authors observed an altered splicing process in the absence of FBXO24. However, it is a little bit confusing how the altered splicing events affect developmental defects. Therefore, the authors should state which mRNAs have undergone abnormal splicing processes and provide ontology analysis for the genes.

      We have performed the ontology analysis and showed the new data in Figure S4D.

      Minor comments

      (1) Figure 1A-C - Statistical comparison is missed. Numbers for biological replication should be described in corresponding legends.

      Thank you for your careful review. We have provided the statistical comparison and the numbers for biological replication in the legends of Figure 1A-C.

      (2) Figure 1E, F - Current images can't clearly resolve the nuclear localization of the FBXO24 testicular germ cells. To clarify the intracellular localization, the authors should provide images with higher resolution.

      The resolution of Figure 1E, F was improved, as suggested. Thank you!

      (3) Figure 1E, F - Scale bar information is missing.

      The scale bars of Figure 1E, F were provided.

      (4) It will be much better to show the predicted frameshift and early termination of the protein translation in Fbxo24-knockout mice.

      The predicted frameshift of Fbxo24-knockout mice was added and shown in Figure S1B.

      (5) It is required to provide primer information for qPCR.

      The primer information for qPCR was provided, as shown in Table S7.

      (6) The authors describe that Fbxo24-KO sperm show abrupt bending of the tail. However, the description is unclear and the sperm shown in Figure 3C seems quite normal. The authors should clarify the abnormal bending pattern of the tail and show quantified results.

      Thank you for pointing out this issue. In Fbxo24 KO sperm, abnormal bending of the sperm tails mainly included neck bending and midpiece bending. We have shown them in Figure S3A.

      (7) The authors mention that Fbxo24-KO sperm have swollen mitochondria at the midpiece, but this is also unclear. How many mitochondria are swollen in Fbxo24-KO sperm?

      This is a good question. However, since it is very difficult to observe all of the mitochondria in each sperm using the electronic microscope, we could not quantify the swollen mitochondria in Fbxo24 KO sperm.

      (8) Scale bar information is missed - Fig 3C insets, Fig 3D, Fig 3F insets, 4A insets, Figure 4C insets.

      All the scale bars have been added.

      (9) How many sperm have annulus defects? In Figure 3F, WT sperm does not have an annulus, which could be damaged during sample preparation. Is the annulus defects in Fbxo24-KO sperm consistent?

      Thank you for asking these questions. Based on our results, about 30% of Fbxo24 KO sperm showed defective annulus structure. Since both TEM (Figure 3F) and SEM (Figure 3G) results clearly showed the defective annulus structure of Fbxo24 KO sperm, we believe the annulus defects are consistent and highly unlikely caused by sample preparation.

      (10) A Cross-section image for the endpiece of Fbxo24-KO sperm is not suitable. There is a longitudinal column structure of the principal piece.

      Thank you for your comments. It is difficult to observe a completely longitudinal structure of sperm tail under TEM. The cross-section of the endpiece and principal piece allowed us know the structure of the axoneme, ODFs and fibrous sheath (FS).

      (11) The endpiece of Fbxo24-KO sperm seems to have a normal axoneme. Do all endpieces of Fbxo24KO sperm have normal axoneme? Also, the authors need to describe whether an axonemal structure is damaged and disrupted in all Fbxo24-KO sperm.

      Our TEM data showed the axonemal structure was impaired in the endpiece of Fbxo24 KO sperm (See right panels of Figure 3H). Moreover, based on the ultrastructure analysis of TEM, we found over 90% of Fbxo24 sperm had a damaged axonemal structure.

      (12) Reference blots in Fig 3I, 3J, 4E (left), 5C and 5E are quite faint. The authors should replace the blot images.

      Thank you for pointing out this. We have rerun Western blot multiple times but could not obtain better images due to antibody sensitivity. However, we quantified the protein levels and performed a statistical comparison, as shown in Figure S3B, to establish a good readout from these images for the readers.

      (13) Loading controls are required - 7D-H.

      Done as suggested. Thanks!

      (14) How do the authors measure the midpiece length? From where to where? This should be clarified.

      Good question. We measured the midpiece length from the sperm neck to the sperm annulus by MitoTracker staining. We have clarified this on Page 16.

      (15) How are the bands for Fbxo24 shifted during IP in Fig 7A?

      The protein modification in the interaction may cause the band shift.

      (16) There are several typos throughout the manuscript. Please check carefully and fix them.

      Thank you for your careful review. We have corrected and fixed all the typos as far as we can.

      Reviewer #2 (Recommendations For The Authors):

      Major comments

      (1) Please provide a schematic of HA-Fbxo24 knock-in construct and strategy together with knockout (Figure S2) or even separately early in Figure S1. The description of using the transgenic mouse is mentioned even earlier than the knockout but there are no citations or methods provided in the text other than that listed in Materials and Methods.

      Thank you for your suggestion. As suggested, the schematic of the HA-Fbxo24 knock-in strategy has been supplemented in Figure S2A. The description of using the transgenic mouse has been added to the results, as shown on page 4 of lines 102-103.

      Also, it is not clear to what extent the phenotypic and molecular characterization of HA-transgenic mice is performed. For example, Lines 134-139: The use of Fbxo24-HA labeled transgenic mice results in the rescue of spermatogenesis and fertility as shown in Figure 2F by measuring the litter size. It is not clear how this observation leads the author to state that this rescues defects in spermiogenesis. Please clarify how and what other measures are taken to support this conclusion. Is the observed infertility due to defects in spermatogenesis or spermiogenesis?

      Thank you for your question. We crossed FBXO24-HATag males with FBXO24−/− females to obtain FBXO24−/−; FBXO24-HATag males. We examined the testes volume and histological morphology of FBXO24−/−; FBXO24-HATag males and found that they were similar to FBXO24+/−; FBXO24-HATag littermates, indicating that spermatogenesis was restored, as shown in Figure S2H.

      (2) Line 107 vs Line 114: Please use the terminology spermatogenesis and spermiogenesis consistently throughout the text. Earlier in the introduction, the authors clearly defined that spermatogenesis involves three phases, with the third phase referred to as spermiogenesis. However, the author concludes in the first line that "FBXO24 plays a role during spermatogenesis" while summarizing at the end of the paragraph that this protein is "expressed in haploid spermatids specifically during spermiogenesis". Therefore, it is not clear whether the authors conclude that FBXO24 is important for all of spermatogenesis (line 107) or only for part of spermiogenesis (line 114). Another example is line 219 vs. 238: At this point in the manuscript, it is again unclear whether the authors want to study molecular changes during spermatogenesis or spermiogenesis upon FBXO24 depletion. Many examples of such cases throughout the text, and it is recommended to be consistent in using more restrictive terminology whenever applicable for a clear interpretation.

      We thank you for your careful review. We have double-checked the terminology of spermatogenesis and spermiogenesis and made it consistent throughout the text of the revised manuscript.

      (3) It is not clear how rampant/frequent the Fbxo24-knockout sperm show defects in head morphology based on Figures 3C, 3F, and 5A since it seems that there are some sperm showing relatively normallooking sperm heads. Please provide quantification.

      We have performed the quantification and found that over 80% of Fbxo24 KO sperm showed defective structures in the sperm head.

      (4) Figure 3B: The authors describe in the figure legend that 3 mice were analyzed in each group. The standard deviation for the WT analysis is missing, or if the author wanted to set the WT value to 100%, the bar and scale shown on the y-axis do not fit. The value for WT looks more like 95%.

      We have indeed analyzed sperm motility based on the WT value set at 100% and have revised Figure 3B in the revision. We apologize for this oversight.

      (5) Figure 3 B and C: It is not clear how the motility is measured. Is CASA used (not described in Methods). The conclusion about abnormal flagellar bending in KO spermatozoa cannot be drawn from the static microscopic images alone. Please provide more details of motility analysis together with videos of live cell imaging.

      The sperm motility was measured manually using a hemocytometer, according to the reference.

      We provided the details of sperm motility analysis in the Materials and Methods section on Page 16.

      (6) Figure 3 I and J: These are one of a few figures that are not supported by statistical analysis. In particular, for 3I, GAPDH controls of WT and KO protein do not show equal loading, which could explain the lower expression of the KO protein. Please show normalized bar graphs with multiple biological replicates or at least show a representee technical replicat that shows equal loading of GAPDH to better support the conclusion.

      Thank you for your suggestion. Statistical comparison of relative protein expression was supplemented, as shown in new Figure S3B.

      (7) Line 184: It is not clear how the authors define a swollen mitochondrion? Are there any size criteria (roundness) that can be measured to distinguish between a swollen and a non-swollen mitochondrion? It is recommended to use another terminology as often 'swollen' implies there is a difference in osmolarity but there is no experiment to support this implication.

      Thank you for your comment. We have changed the “swollen” to “vacuolar” in the revision, as shown on Page 7.

      (8) Figure S4, without a bright field image, it is hard to see the purity and morphology of the isolated prep. Please provide the bright field images together or as overlaid images.

      We agree with your comment. We have provided the overlaid images in new Figure S4A.

      (9) There is a big logic jump in what prompts the authors to look MIWI protein level and link the observation to MIWI/piRNA pathway in both Introduction and Results while it is one of the main findings. It is recommended to provide a better rationale and logical flow in the text.

      Thank you for your suggestion. We have added a sentence explaining why we wanted to focus on studying MIWI expression (see lines 190-193 on page 7).

      Minor comments

      (1) Please keep all the conventions of gene vs. protein nomenclature. For example, write the genes mentioned in the figures in italics with the first letter in Capital, as it is done in the main part. Proteins should be in ALL CAPITAL like FBXO24.

      The names of gene and protein have been revised in the revision, as suggested.

      (2) In the MM section, the name of the manufacturer and the location of the materials used are missing in several sections. Please go back through the MM section and add this information in the appropriate places.

      Done as suggested. Thank you!

      (3) On page 4, the authors mentioned that "Further qPCR analysis of developmental testes and purified testicular cells showed that FBXO24 mRNA was highly expressed in the round spermatids and elongating spermatids (Fig 1B-C)". Please include statistical analyses for Fig 1B-C as well as for Fig 1A to support the written statements.

      Statistical comparison was supplemented, as shown in Figure 1. P-values are denoted in figures by *p < 0.05.

      (4) Figure 3E: Please describe in more detail how the length of the midpiece was measured. Was it based on TEM images or based on fluorescent images using MitoTracker?

      As we responded to Reviewer #1, we measured the midpiece length from the sperm neck to the sperm annulus by MitoTracker staining. We have clarified this in the Method and Material section on Page 16.

      (5) Line 431: In the "Electron Microscopy" section of the MM part, the author should indicate the ascending ethanol series (%) used.

      Done as suggested. Thank you!

      (6) Line 432: The thickness of the sections prepared is missing, as well as an indication of the microtome used.

      We have added thickness and the microtome in the Method and Material section on Page 16.

      (7) Line 433: If the generated tiff files have been processed with Adobe Photoshop, this information is missing.

      We have provided information on the usage of Adobe Photoshop for the generation of tiff files on Page 17.

      (8) Lines 445, 452, 467: In some places in the paper, the temperature is written with a space between the number and {degree sign}C, and sometimes it is not. Please go through the paper and make it consistent. The usual spelling is 4{degree sign}C.

      We have gone through the manuscript and checked all the spelling of temperature writing to make them consistent. Thank you for careful review.

      (9) Line 469: The gel documentation system used is not mentioned.

      Done as suggested. Thank you!

      (10) Line 469: The 'TM' should be superscripted.

      Done as suggested.

      (11) Line 489: A space is missing between the changes and the parenthesis.

      Done as suggested.

      (12) Line 495-496: The authors write that the fractions enriched with round spermatids after sedimentation were collected manually. Was a determination of cell concentration - e.g., 2 x106 cells/ml -performed after collection of the cells? How were the cells stored until use? Please add the sedimentation time and used temperature.

      Store the cell in the 1´ Krebs buffer on ice. The cell sediment was through a BSA density gradient for 1.5 h at 4°C. The cell concentration was determined after collection, as shown on Page 18.

      (13) Line 505: spelling error. Instead of " manufacturer's procedure" it is written manufactures' instructions.

      The spelling error was corrected.

      (14) Line 520: Please write a short sentence on how the purification of the 16-40 nt long RNA was performed.

      The length of 16–40 nt RNA was enriched by polyacrylamide gel electrophoresis. We added this information on Page 19 of line 531.

      (15) Line 528: The version of the used GraphPad software is missing.

      The version of GraphPad software was supplemented, as shown on Page 19.

      (16) Line 677: For qPCR analyses, the number of mice analyzed (N) and a statistical evaluation are missing.

      The statistical comparison and the numbers for biological replication were added, as shown on Page 26.

      (17) Figure 3D: Please add a scale bar.

      Done as suggested. Thanks!

      (18) Line 371 and Line 377: Two times "in summary" is written. Please make one summary for the whole paper.

      This sentence was revised, as shown in Page 13.

      (19) Line 382: To be consistent in the whole paper, please write Figure 10 in bold letters.

      Done as suggested.

      (20) Please make the size and font of the references consistent with the main text.

      Done as suggested. Thanks again for your careful review.

      Reviewer #3 (Recommendations For The Authors):

      I would like to see the description of the FBXO24 immunoprecipitation experiment performed in HEK293T cells. This somatic cell line does not normally express Miwi, so how Miwi was detected in FBXO24 mCherry IP beads? It is not mentioned if Miwi is expressed from a recombinant vector in this experiment. Similarly, I would like to see a better description of the experiment described in the same paragraph towards the end of it with the ubiquitin peptides, it is not clear.

      Thank you for your comments. FBXO24-mCherry was expressed in HEK293T cells and the immunoprecipitates was incubated with the protein lysate of the testes (see lines 268-272 on Page 10). The description of the ubiquitin experiment was added as well, as shown in lines 283-286 on Page 10.

      Line 263: I think the term ectopic here is not appropriate, a correction is needed.

      We have changed “ectopic” to “increased” in the revision (see line 268 on Page 10).

      I would like the authors to provide a tentative explanation or evidence of why FBXO24 KO males are completely sterile, even though there are still mature sperm produced with some motility. Since there are defects in nuclear condensation it will be very relevant to check DNA damage/fragmentation, which could contribute to the sterility phenotype.

      This is a good suggestion. We reanalyzed the sperm DNA damage by TUNEL staining and shown the new data in Figure S3E-F.

      Line 213: There have been some conflicting reports about the role of RNF8 in spermiogenesis, but a recent report has shown that RNF8 is not involved in histone PTMs that mediate histone to protamine transition (Abe et al Biol Reprod 2021 https://doi.org/10.1093%2Fbiolre%2Fioab132).

      Thank you for your comment. We have cited this critical reference and discussed it in Discussion section on Page 12.

      Figure 7: I would like to see zoomed-out views of the affected exons, so that flanking unaffected exons can be used as a reference for unaffected splicing. Most of the genome browser views in this image only show affected exons and it is impossible to see if these alone are affected or if the reduced RNAseq coverage in those exons is a result of overall reduced mapped reads in these genes. Also, a fixed Y axis with the same max value should be shown for these genome browser snapshots so that the expression level is comparable between the two genotypes.

      Thank you for your comments. Loading control of RT-PCR and scale range of Y axis were added in new Figure 7.

      Minor corrections:

      Line 70: correct "..functions as protein-protein interaction..".

      Thank you for your careful review. We have corrected this sentence (see line 69 on Page 3).

      Line 101: correct "..qPCR analysis of developmental testis..".

      We have corrected this sentence (see line 100 on Page 4). Thanks again.

      Line 116: correct "..results in detective..".

      Corrected.

      Line 186: correct ".. explored..".

      Corrected.

      Line 218: correct ".. gene expressions.

      Corrected.

      Line 221: correct "..genes significantly differentiated expressed".

      Corrected.

      Line 241: FBXO24 was shown earlier in both cytoplasm and nucleus.

      We have changed “FBXO24 is mainly confined to the nucleus” to “FBXO24 expressed in the nucleus”, as shown in line 247 on Page 9.

      Line 501-502: correct "..reverse transcriptional".

      “reverse transcriptional” was changed into “reverse transcription”, showing in Page 18.

      Line 686: correct ".. deficiency male..".

      Corrected.

      Line 769: correct "..Western blots were adopted..".

      Corrected.

      Line 784: correct "..WT tesis..".

      Corrected.

      I cannot understand exactly what is shown in Figure 9B. Some elements marked on the X-axis are single base locations (-2K, TSS, +2K) and others are stretches of sequences so they cannot be equivalent. Why there is only an intron shown? There should be a measure of normalized expression on the Y-axis.

      Thank you for your questions. The X-axis means that genome segments were scaled to the same size and were calculated the signal abundance, which was analyzed by computeMatrix. Aim to know the piRNA source, piRNA was mapped to the gene body, including introns, CDS and UTRs. The value of the Y-axis is the normalized count.

      Figure 6F is not needed.

      Figure 6F was used to illustrate the number of different types of mRNA splicing upon FBXO24 deletion in the round spermatids. To better understand the splicing for the reader, we decided to keep it.

      The last two paragraphs of the discussion seem to be redundant.

      Thank you for pointing out this. We have revised the last two paragraphs of the discussion.

    2. eLife assessment

      This important study provides insights into the role of FBXO24 in controlling spermiogenesis and male fertility in mice. The mouse models used and the data are convincing. This paper will interest biomedical researchers working on reproductive biology and fertility control.

    3. Reviewer #1 (Public Review):

      In this study, Li et al., report that FBXO24 contributes to sperm development by modulating alternative mRNA splicing and MIWI degradation during spermiogenesis. The authors demonstrated that FBXO24 deficiency impairs sperm head formation, midpiece compartmentalization, and axonemal/peri-axonemal organization in mature sperm, which causes sperm motility defects and male infertility. In addition, FBXO24 interacts with various mRNA splicing factors, which causes altered splicing events in Fbxo24-null round spermatids. Interestingly, FBXO24 also modulates MIWI levels via its polyubiquitination in round spermatids. Thus, the authors address that FBXO24 modulates global mRNA levels by regulating piRNA-mediated MIWI function and splicing events in testicular haploid germ cells.

      This study is performed with various experimental approaches to explore and elucidate underlying molecular mechanisms for the FBXO24-mediated sperm defects during germ cell development. Overall, the experiments were designed properly and performed well to support the authors' observation in each part. In addition, the findings in this study are useful for understanding the physiological and developmental significance of FBXO24 in the male germ line, which can provide insight into impaired sperm development and male infertility.

      In the revised manuscript, the authors address most of the concerns raised in the previous review. The following are representative remaining points.

      • Quantification of the defective, vacuolar mitochondria (80%) and missing annulus (30%) was not shown in the figures or described in the results as well as in a few other figures.

    4. Reviewer #2 (Public Review):

      Spermatogenesis describes a complex sequence of differentiation events that lead to the development of genetically distinct male germ cells. The final part of spermatogenesis is called spermiogenesis, in which spermatids differentiate into mature sperm by developing an acrosome and a motile flagellum, which are required for reaching and successfully penetrating the oocyte. This process of spermatogenesis is based on a coordinated regulation of gene expressions in round spermatids. In the current study, FBXO24 was identified as a highly expressed protein in human and mouse testis. To define its biological role in vivo, the authors generated genetically engineered Fbxo24 knockout and Fbxo24-HA-labeled transgenic mouse models.

      To elucidate the causes of the observed sterility in Fbxo24-KO males, the authors performed molecular and phenotypic analyses that revealed aberrant histone retention, incomplete axonemes, oversized chromatoid bodies (CB), and abnormal mitochondrial coiling along the sperm flagella. These results support the causal role of the FBXO24 gene in sperm motility.

      Furthermore, the authors carefully characterized by SEM, TEM and western blot analyses that deletion of FBXO24 leads to incomplete histone-to-protamine exchange and defective chromatin interaction during spermiogenesis. In addition to increased MIWI expression, the authors show that FBXO24 interacts with SCF subunits and mediates the degradation of MIWI via K48-linked polyubiquitination.

      This is a solid work demonstrating the role of FBXO24 in modulating alternative mRNA splicing, MIWI degradation and normal spermiogenesis.

    5. Reviewer #3 (Public Review):

      This work is carried out by the research group led by Shuiqiao Yuan, who has a long interest and significant contribution in the field of male germ cell development. The authors study a protein for which limited information existed prior to this work, a component of the E3 ubiquitin ligase complex, FBXO24. The authors generated the first FBXO24 KO mouse model reported in the literature using CRISPR, which they complement with HA-tagged FBXO24 transgenic model in the KO background. The authors begin their study with a very careful examination of the expression pattern of the FBXO24 gene at the level of mRNA and the HA-tagged transgene, and they provide conclusive evidence that the protein is expressed exclusively in the mouse testis and specifically in post-meiotic spermatids of stages VI to IX, which include early stages of spermatid elongation and nuclear condensation. The authors report a fully sterile phenotype for male mice, while female mice are normal. Interestingly, the testis size and the populations of spermatogenic cells in the KO mutant mice show small (but significant) reduction compared to the WT testis. Importantly, the mature sperm from KO animals show a series of defects that were very thoroughly documented in this work by scanning and transmission electron microscopy; this data constitutes a very strong point in this paper. FBXO24 KO sperm have severe defects in the mitochondrial sheath with missing mitochondria near the annulus, and missing outer dense fibers. Collectively these defects cause abnormal bending of the flagellum and severely reduced sperm motility. Moreover, defects in nuclear condensation are observed with faint nuclear staining of elongating and elongated spermatids, and reduction of protein levels of protamine 2 combined with increased levels of histones and transition protein 1. All the above are in line with the observed male sterility phenotype.

      The authors also performed RNASeq in the KO animal, and found profound changes in the abundance of thousands of mRNAs; changes in mRNA splicing patterns were noted as well. This data reveals deeply affected gene expression patterns in the FBXO24 KO testis, which further supports the essential role that this factor serves in spermiogenesis. Unfortunately, a molecular explanation of what causes these changes is missing; it is still possible that they are an indirect consequence of the absence of FBXO24 and not directly caused by it.

      The finding that Miwi protein levels are increased in the FBXO24 KO testis is an important point in this work, and it is in agreement with the observed increased size of the chromatoid body, where most of Miwi protein is accumulated in round spermatids. This finding is well supported with experiments performed in 293T cells showing that Miwi ubiquitination is FBXO24 dependent in this ectopic system. Moreover, the authors detect reduced ubiquitination of endogenous Miwi protein immunoprecipitated from FBXO24 KO testis. Consistent with an increase in Miwi protein levels, Miwi-sized piRNAs show increased abundance in total RNA from FBXO24 KO testis. It has been documented that Piwi proteins stabilize their piRNA cargo, so the increase in piRNA levels in 29-32 nt sizes is most likely not a result of altered biogenesis, but increased half-life of the piRNAs as a result of Miwi upregulation. piRNAs have been involved in the regulation of mRNAs in the post-meiotic spermatid, but it is unclear how increased Miwi protein and its piRNA cargo at the levels observed in this study contribute to the complete infertility phenotype of the FBXO24 KO male mice.

      Therefore, a well-reasoned narrative on if and how the absence of FBXO24 as an E3 ubiquitin ligase is responsible for the observed mRNA and protein differential expression is largely absent. If FBXO24-mediated ubiquitination is required for normal protein degradation during spermiogenesis, protein level increase should be the direct consequence of genuine FBXO24 targets in the KO testis. Apart from Miwi, the possible involvement of ubiquitination was not shown for any other proteins that the authors found interact with FBXO24 such as splicing factors SRSF2, SRSF3, SRSF9, or any of the other proteins whose levels were found to be changed (reduced, thus the change in the KO is less likely due to absence of ubiquitination) such as ODF2, AKAP3, TSSK4, PHF7, TSSK6 and RNF8. Interestingly, the authors do observe increased amounts of histones and transition proteins, but reduced amounts of protamines, which directly shows that histone to protamine transition is indeed affected in the FBXO24 KO testis, consistent with the observed less condensed nuclei of spermatozoa. Could histones and transition proteins be targets of the proposed ubiquitin ligase activity of FBXO24, and in its absence, histone replacement is abrogated? Providing experimental evidence to address this possibility would greatly expand our understanding on why FBXO24 is essential during spermiogenesis.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Maestri et al. use an integrative framework to study the evolutionary history of coronaviruses. They find that coronaviruses arose recently rather than having undergone ancient codivergences with their mammalian hosts. Furthermore, recent host switching has occurred extensively, but typically between closely related species. Humans have acted as an intermediate host, especially between bats and other mammal species.

      Strengths:

      The study draws on a range of data sources to reconstruct the history of virus-host codivergence and host switching. The analyses include various tests of robustness and evaluations through simulation.

      Weaknesses:

      The analyses are limited to a single genetic marker (RdRp) from coronaviruses, but using other sections of the genome might lead to different conclusions. The genetic marker also lacks resolution for recent divergences, which precludes the detailed examination of recent host switches. Careful and detailed reconstruction of the timescale would be helpful for clarifying the evolutionary history of coronaviruses alongside their hosts.

      The use of a single short genetic marker (the RdRp palmprint region) from coronaviruses is indeed a limitation. However, this marker is the one that is currently used for routinely delimiting operational taxonomic units in RNA viruses and reconstructing their evolutionary history (Edgar et al. 2022, see also the Serratus project; https://serratus.io/); therefore, we took the conscious decision early on to rely on this expertise. Unfortunately, this marker cannot provide robust timescale reconstructions for coronavirus evolution (previous estimates of coronavirus origin range from around 10 thousand years ago to 293 million years ago depending on modeling assumptions). Only future genomic work across Coronaviridae that will characterize multiple genetic regions with different evolutionary rates will allow us to precisely elucidate the timescale of the evolutionary history of coronaviruses alongside their hosts. In the meantime, we show here that, while the RdRp palmprint region cannot by itself resolve the precise timescale of coronavirus evolution, it strongly suggests, when used along with cophylogenetic approaches, a recent evolutionary origin in bats.

      R. C. Edgar, et al., Petabase-scale sequence alignment catalyses viral discovery. Nature 602, 142–147 (2022).

      Reviewer #2 (Public Review):

      Summary:

      In their study titled "Recent evolutionary origin and localized diversity hotspots of mammalian coronaviruses," authors Benoît Perez-Lamarque, Renan Maestri, Anna Zhukova, and Hélène Morlon investigate the complex evolutionary history of coronaviruses, particularly those affecting mammals, including humans. The study focuses on unraveling the evolutionary trajectory of these viruses, which have shown a high propensity for causing pandemics, as evidenced by the SARS-CoV2 outbreak.

      The research addresses a significant gap in our understanding of the evolutionary dynamics of coronaviruses, particularly their history, patterns of host-to-host transmission, and geographical spread. These aspects are important for predicting and managing future pandemic scenarios.

      Historically, studies have employed cophylogenetic tests to explore virus-host relationships within the Coronaviridae family, often suggesting a long history of virus-host codiversification spanning millions of years. However, the team led by Perez-Lamarque proposes a novel phylogenetic framework that contrasts this traditional view. Their approach, which involves adapting gene tree-species tree reconciliation, is designed to robustly test the validity of two competing scenarios: an ancient origination and codiversification versus a more recent emergence and diversification through host switching.

      Upon applying this innovative framework to the study of coronaviruses and their mammalian hosts, the authors' findings challenge the prevailing notion of a deep evolutionary history. Instead, their results strongly support a scenario where coronaviruses have a more recent origin, likely in bat populations, followed by diversification predominantly through host-switching events. This diversification, interestingly, seems to occur preferentially within mammalian orders.

      A critical aspect of their findings is the identification of hotspots of coronavirus diversity, particularly in East Asia and Europe. These regions align with the proposed scenario of a relatively recent origin and subsequent localized host-switching events. The study also highlights the rarity of spillovers from bats to other species, yet underscores the relatively higher likelihood of such spillovers occurring towards humans, suggesting a significant role for humans as an intermediate host in the evolutionary journey of these viruses.

      The research also points out the high rates of host-switching within mammalian orders, including between humans, domesticated animals, and non-flying wild mammals.

      In conclusion, the study by Perez-Lamarque and colleagues presents an important quantitative advance in our understanding of the evolutionary history of mammalian coronaviruses. It suggests that the long-held belief in extensive virus-host codiversification may have been substantially overestimated, paving the way for a reevaluation of how we understand, predict, and potentially control the spread of these viruses.

      Strengths:

      The study is conceptually robust, and its conclusions are convincing.

      Weaknesses:

      Despite the availability of a dated host tree the authors were only able to use the "undated" model in ALE, with the dated method (which only allows time-consistent transfers) failing on their dataset (possibly due to dataset size?). Further exploration of the question would be potentially valuable.

      Our intuition is that ALE in its “dated” version did not necessarily fail on our dataset due to its size (ALE ran, but provided unrealistic parameter estimates and was not able to output possible reconciliations, as mentioned in our Material and Methods section). We think it most likely did not run because there is no pattern of codiversification: the coronavirus and mammal trees are so distinct that finding a reconciliation scenario between these trees with time-consistent transfers is very difficult and ALE fails at estimating an amalgamated likelihood for such an unlikely scenario. Following a suggestion from reviewer #3, we are going to try running the dated version of ALE independently on the alpha and beta-coronaviruses, resulting in smaller datasets. This will help us elucidate whether the dated version of ALE fails due to data size or the absence of a codiversification pattern.

      Reviewer #3 (Public Review):

      Summary:

      This work uses tools and concepts from co-phylogenetic analyses to reconstruct the evolutionary and diversification history of coronaviruses in mammals. It concludes that cross-species transmissions from bats to humans are a relatively common event (compared to bats to other species). Across all mammals, the diversification history of coronaviruses suggests that there is potential for further evolutionary diversification.

      Strengths:

      The article uses an interesting approach based on jointly looking at the extant network of coronaviruses-mammals interactions, and the phylogenetic history of both these organisms. The authors do an impressive job of explaining the challenges of reconstructing evolutionary dynamics for RNA viruses, and this helps readers appraise the relevance of their approach.

      Weaknesses:

      I remain unconvinced by the argument that sampling does not introduce substantial biases in the analyses. As the authors highlight, incomplete knowledge of the extant interactions would lead to a biased reconstruction of the diversification history. In a recent paper (Poisot et al. 2023, Patterns), we look at sampling biases in the virome of mammals and suggest that is a fairly prominent issue, that is furthermore structured by taxonomy, space, and phylogenetic position. Case in point, even for betacoronaviruses, there have been many newly confirmed hosts in recent years. For organisms that have received less intense scrutiny, I think a thorough discussion of potential gaps in data would be required (see for example Cohen et al. 2022, Nat. Comms).

      I was also surprised to see little discussion of the differences between alpha and beta coronaviruses - there is evidence that they may differ in their cross-species transmission (see Caraballo et al. 2022 Micr. Spectr.), which could call into question the relevance of treating all coronaviruses as a single, homogeneous group.

      Some of the discussions in this paper also echo previous work by e.g. Geoghegan et al. (see 2017, PLOS Pathogens), which I was surprised to not see discussed, as it is a much earlier investigation of the relative frequencies of co-divergence and host switches for different viral families, with a deep discussion of how this may structure future evolutionary dynamics.

      We totally agree that sampling biases in the virome of mammals is a prominent issue, which is why we conducted a series of sensitivity analyses to test their effect on our main conclusions. We thoroughly tested the effect of (i) the unequal sampling effort across mammalian species that have been screened and (ii) the unequal screening of mammalian species across the mammalian tree of life by subsampling the data to correct for the unequal sampling effort (see Supporting Information Text). In both cases, we still reported low support for a scenario of codiversification, the origin in bats in East Asia, the preferential host switches within mammalian orders, and the rare spillovers from bats to humans. The robustness of our findings to sampling biases may be explained by the fact that the cophylogenetic approach we used (ALE) explicitly accounts for undersampling by assuming that all host transfers involve unsampled intermediate hosts. To address the reviewer's comment, we will better underline the importance of sampling biases in our main text and include the suggested references. We will also better highlight our sensitivity analyses by moving them from the Supporting Information Text to the main text.

      We agree that distinguishing between alpha and beta coronaviruses will provide useful additional insights; we are going to run separate cophylogenetic analyses for these two sub-clades. We will report the results of these additional analyses in the revised manuscript, and put them in context with the existing literature about the two sub-clades.

      We were not aware of the work of Geoghegan et al. (see 2017, PLOS Pathogens), thank you for providing this reference that we will now discuss.

    2. Reviewer #1 (Public Review):

      Summary:<br /> In this study, Maestri et al. use an integrative framework to study the evolutionary history of coronaviruses. They find that coronaviruses arose recently rather than having undergone ancient codivergences with their mammalian hosts. Furthermore, recent host switching has occurred extensively, but typically between closely related species. Humans have acted as an intermediate host, especially between bats and other mammal species.

      Strengths:<br /> The study draws on a range of data sources to reconstruct the history of virus-host codivergence and host switching. The analyses include various tests of robustness and evaluations through simulation.

      Weaknesses:<br /> The analyses are limited to a single genetic marker (RdRp) from coronaviruses, but using other sections of the genome might lead to different conclusions. The genetic marker also lacks resolution for recent divergences, which precludes the detailed examination of recent host switches. Careful and detailed reconstruction of the timescale would be helpful for clarifying the evolutionary history of coronaviruses alongside their hosts.

    3. eLife assessment

      Maestri et al report the absence of phylogenetic evidence supporting codiversification of mammalian coronaviruses and their hosts, leading to the important conclusion that the evolutionary history of the virus and its hosts are decoupled through frequent host switches. The evidence for frequent host switching, derived from a probabilistic model of co-evolution, appears convincing, but evidence for quantitative statements about the time of the last common ancestor of extant mammalian coronaviruses remains incomplete. The results would be strengthened by a reconstruction of the evolutionary timescale and further investigation of robustness to sampling biases and unsampled diversity.

    4. Reviewer #2 (Public Review):

      Summary:<br /> In their study titled "Recent evolutionary origin and localized diversity hotspots of mammalian coronaviruses," authors Benoît Perez-Lamarque, Renan Maestri, Anna Zhukova, and Hélène Morlon investigate the complex evolutionary history of coronaviruses, particularly those affecting mammals, including humans. The study focuses on unraveling the evolutionary trajectory of these viruses, which have shown a high propensity for causing pandemics, as evidenced by the SARS-CoV2 outbreak.

      The research addresses a significant gap in our understanding of the evolutionary dynamics of coronaviruses, particularly their history, patterns of host-to-host transmission, and geographical spread. These aspects are important for predicting and managing future pandemic scenarios.

      Historically, studies have employed cophylogenetic tests to explore virus-host relationships within the Coronaviridae family, often suggesting a long history of virus-host codiversification spanning millions of years. However, the team led by Perez-Lamarque proposes a novel phylogenetic framework that contrasts this traditional view. Their approach, which involves adapting gene tree-species tree reconciliation, is designed to robustly test the validity of two competing scenarios: an ancient origination and codiversification versus a more recent emergence and diversification through host switching.

      Upon applying this innovative framework to the study of coronaviruses and their mammalian hosts, the authors' findings challenge the prevailing notion of a deep evolutionary history. Instead, their results strongly support a scenario where coronaviruses have a more recent origin, likely in bat populations, followed by diversification predominantly through host-switching events. This diversification, interestingly, seems to occur preferentially within mammalian orders.

      A critical aspect of their findings is the identification of hotspots of coronavirus diversity, particularly in East Asia and Europe. These regions align with the proposed scenario of a relatively recent origin and subsequent localized host-switching events. The study also highlights the rarity of spillovers from bats to other species, yet underscores the relatively higher likelihood of such spillovers occurring towards humans, suggesting a significant role for humans as an intermediate host in the evolutionary journey of these viruses.

      The research also points out the high rates of host-switching within mammalian orders, including between humans, domesticated animals, and non-flying wild mammals.

      In conclusion, the study by Perez-Lamarque and colleagues presents an important quantitative advance in our understanding of the evolutionary history of mammalian coronaviruses. It suggests that the long-held belief in extensive virus-host codiversification may have been substantially overestimated, paving the way for a reevaluation of how we understand, predict, and potentially control the spread of these viruses.

      Strengths:<br /> The study is conceptually robust, and its conclusions are convincing.

      Weaknesses:<br /> Despite the availability of a dated host tree the authors were only able to use the "undated" model in ALE, with the dated method (which only allows time-consistent transfers) failing on their dataset (possibly due to dataset size?). Further exploration of the question would be potentially valuable.

    5. Reviewer #3 (Public Review):

      Summary:<br /> This work uses tools and concepts from co-phylogenetic analyses to reconstruct the evolutionary and diversification history of coronaviruses in mammals. It concludes that cross-species transmissions from bats to humans are a relatively common event (compared to bats to other species). Across all mammals, the diversification history of coronaviruses suggests that there is potential for further evolutionary diversification.

      Strengths:<br /> The article uses an interesting approach based on jointly looking at the extant network of coronaviruses-mammals interactions, and the phylogenetic history of both these organisms. The authors do an impressive job of explaining the challenges of reconstructing evolutionary dynamics for RNA viruses, and this helps readers appraise the relevance of their approach.

      Weaknesses:<br /> I remain unconvinced by the argument that sampling does not introduce substantial biases in the analyses. As the authors highlight, incomplete knowledge of the extant interactions would lead to a biased reconstruction of the diversification history. In a recent paper (Poisot et al. 2023, Patterns), we look at sampling biases in the virome of mammals and suggest that is a fairly prominent issue, that is furthermore structured by taxonomy, space, and phylogenetic position. Case in point, even for betacoronaviruses, there have been many newly confirmed hosts in recent years. For organisms that have received less intense scrutiny, I think a thorough discussion of potential gaps in data would be required (see for example Cohen et al. 2022, Nat. Comms).

      I was also surprised to see little discussion of the differences between alpha and beta coronaviruses - there is evidence that they may differ in their cross-species transmission (see Caraballo et al. 2022 Micr. Spectr.), which could call into question the relevance of treating all coronaviruses as a single, homogeneous group.

      Some of the discussions in this paper also echo previous work by e.g. Geoghegan et al. (see 2017, PLOS Pathogens), which I was surprised to not see discussed, as it is a much earlier investigation of the relative frequencies of co-divergence and host switches for different viral families, with a deep discussion of how this may structure future evolutionary dynamics.

    1. Author Response

      Reviewer #1:

      This manuscript presents an extremely exciting and very timely analysis of the role that the nucleosome acidic patch plays in SWR1-catalyzed histone exchange. Intriguingly, SWR1 loses activity almost completely if any of the acidic patches are absent. To my knowledge, this makes SWR1 the first remodeler with such a unique and pronounced requirement for the acidic patch. The authors demonstrate that SWR1 affinity is dramatically reduced if at least one of the acidic patches is absent, pointing to a key role of the acidic patch in SWR1 binding to the nucleosome. The authors also pinpoint a specific subunit - Swc5 - that can bind nucleosomes, engage the acidic patch, and obtain a cryo-EM structure of Swc5 bound to a nucleosome. They also identify a conserved arginine-rich motif in this subunit that is critical for nucleosome binding and histone exchange in vitro and for SWR1 function in vivo. The authors provide evidence that suggests a direct interaction between this motif and the acidic patch.

      Strengths:

      The manuscript is well-written and the experimental data are of outstanding quality and importance for the field. This manuscript significantly expands our understanding of the fundamentally important and complex process of H2A.Z deposition by SWR1 and would be of great interest to a broad readership.

      We thank the reviewer for their enthusiastic and positive comments on our work.

      Reviewer #2:

      Summary:

      In this study, Baier et al. investigated the mechanism by which SWR1C recognizes nucleosomal substrates for the deposition of H2A.Z. Their data convincingly demonstrate that the nucleosome's acidic patch plays a crucial role in the substrate recognition by SWR1C. The authors presented clear evidence showing that Swc5 is a pivotal subunit involved in the interaction between SWR1C and the acidic patch. They pared down the specific region within Swc5 responsible for this interaction. However, two central assertions of the paper are less convincing. First, the data supporting the claim that the insertion of one Z-B dimer into the canonical nucleosome can stimulate SWR1C to insert the second Z-B dimer is somewhat questionable (see below). Given that this claim contradicts previous observations made by other groups, this hypothesis needs further testing to eliminate potential artifacts. Secondly, the claim that SWR1C simultaneously recognizes the acidic patch on both sides of the nucleosome also needs further investigation, as the assay used to establish this claim lacks the sensitivity necessary to distinguish any difference between nucleosomal substrates containing one or two intact acidic patches.

      Strengths:

      As mentioned in the summary, the authors presented clear evidence demonstrating the role of Swc5 in recognition of the nucleosome acidic patch. The identification of the specific region in Swc5 responsible for this interaction is important.

      We thank the reviewer for their careful critique of our work. Below we address each major concern.

      Major comments:

      (1) Figure 1B: It is unclear how much of the decrease in FRET is caused by the bleaching of fluorophores. The authors should include a negative control in which Z-B dimers are omitted from the reaction. In the absence of ZB dimers, SWR1C will not exchange histones. Therefore, any decrease in FRET should represent the bleaching of fluorophores on the nucleosomal substrate, allowing normalization of the FRET signal related to A-B eviction.

      In this manuscript, as well as in our two previous publications (Singh et al., 2019; Fan et al.,2022), we have presented the results of no enzyme controls, +/- ZB dimers, no ATP controls, or AMP-PNP controls for our FRET-based, H2A.Z deposition assay (see also Figure S3). We do not observe significant levels of photobleaching in this assay, either during ensemble measurements or in an smFRET experiment. To aid the reader, we have added the AMP-PNP data for the experiment shown in Figure 1B. The results show there is less than a 10% decrease in FRET over 30’, and the signal from the double acidic patch disrupted nucleosome is identical to this negative control.

      (2) Figure S3: The authors use the decrease in FRET signal as a metric of histone eviction. However, Figure S3 suggests that the FRET signal decrease could be due to DNA unwrapping. Histone exchange should not occur when SWR1C is incubated with AMP-PNP, as histone exchange requires ATP hydrolysis (10.7554/eLife.77352). And since the insertion of Z-B dimer and the eviction of A-B dimer are coupled, the decrease of FRET in the presence of AMP-PNP is unlikely due to histone eviction or exchange. Instead, the FRET decrease is likely due to DNA unwrapping (10.7554/eLife.77352). The authors should explicitly state what the loss of FRET means.

      We agree with the reviewer, that loss of FRET can be due to DNA unwrapping from the nucleosome. We have previously demonstrated this activity by SWR1C in our smFRET study (Fan et al., 2022). However, DNA unwrapping is highly reversible and has a time duration of only 1-3 seconds. We and others have not observed stable unwrapping of nucleosomes by SWR1C, but rather the stable loss of FRET reports on dimer eviction. We assume the reviewer is concerned about the rather large decrease in FRET signal shown in the AMP-PNP controls for Figure S3, panels A and D. For the other 7 panels, the decrease in FRET with AMP-PNP are minimal. In fact, if we average all of the AMP-PNP data points, the rate of FRET loss is not statistically different from no enzyme control reactions (nucleosome plus ZB dimers).

      Data for panels A and D used a 77NO nucleosomal substrate, with Cy3 labeling the linker distal dimer. This is our standard DNA fragment, and it was used in Figure 1B. The only difference between data sets is that the data shown in Fig 1B used nucleosome reconstituted with a Cy5-labelled histone octamer, rather than the hexasome assembly method used for Fig S3. Three points are important. First, for all of these substrates, we assembled 3 independent nucleosomes, and the results are highly reproducible. Two, we performed a total of 6 experiments for the 77NO-Cy5 substrates to ensure that the rates were accurate (+/-ATP). Third, and most important, we do not see this decrease in FRET signal in the absence of SWR1C (no enzyme control). This data was included in the data source file. Thus, it appears that there is significant SWR1C-induced nucleosome instability for these two hexasome-assembled substrates. We now note this in the legend to Figure S3. Key for this work, however, is that there is a large increase in the rate of FRET loss in the presence of ATP, and this rate is faster when a ZB dimer was present at the linker proximal location. In response to the last point, we state in the first paragraph of the results: “The dimer exchange activity of SWR1C is monitored by following the decrease in the 670 nm FRET signal due to eviction of the Cy5-labeled AB-Cy5 dimer (Figure 1A).”

      (3) Related to point 2. One way to distinguish nucleosomal DNA unwrapping from histone dimer eviction is that unwrapping is reversible, whereas A-B eviction is not. Therefore, if the authors remove AMP-PNP from the reaction chamber and a FRET signal reappears, then the initial loss of FRET was due to reversible DNA unwrapping. However, if the removal of AMP-PNP did not regain FRET, it means that the loss of FRET was likely due to A-B eviction. The authors should perform an AMP-PNP and/or ATP removal experiment to make sure the interpretation of the data is correct.

      See response to item 2 above

      (4) The nature of the error bars in Figure 1C is undefined; therefore, the statistical significance of the data is not interpretable.

      We apologize for not making this more explicit for each figure. The error bars report on 95% confidence intervals from at least 3 sets of experiments. This statement has been added to the legend.

      (5) The authors claim that the SWR1C requires intact acidic patches on both sides of the nucleosomes to exchange histone. This claim was based on the experiment in Figure 1C where they showed mutation of one of two acidic patches in the nucleosomal substrate is sufficient to inhibit SWR1C-mediated histone exchange activity. However, one could argue that the sensitivity of this assay is too low to distinguish any difference between nucleosomes with one (i.e., AB/AB-apm) versus two mutated acidic patches (i.e., AB-apm/AB-apm). The lack of sensitivity of the eviction assay can be seen when Figure 1B is taken into consideration. In the gel-shift assay, the AB-apm/AB-apm nucleosome exhibited a 10% SWR1C-mediated histone exchange activity compared to WT. However, in the eviction assay, the single AB/AB-apm mutant has no detectable activity. Therefore, to test their hypothesis, the authors should use the more sensitive in-gel histone exchange assay to see if the single AB/AB-apm mutant is more or equally active compared to the double AB-apm/AB-apm mutant.

      Our pincher model is based on three, independent sets of data, not just Figure 1C. First, as noted by the reviewer, we find that disruption of either acidic patch cripples the dimer exchange activity of SWR1C in the FRET-based assay. Whether the defect is identical to that of the double APM mutant nucleosome does not seem pertinent to the model. In a second set of assays, we used fluorescence polarization to quantify the binding affinity of SWR1C for wildtype nucleosomes, a double APM nucleosome, or each single APM nucleosome. Consistent with the pincher model, each single APM disruption decreases binding affinity at least 10-fold (below the sensitivity of the assay). Finally, we monitored the ability of different nucleosomes to stimulate the ATPase activity of SWR1C. Consistent with the pincher model, a single APM disruption was sufficient to eliminate nucleosome stimulation.

      (6) The authors claim that the AZ nucleosome is a better substrate than the AA nucleosome. This is a surprising result as previous studies showed that the two insertion steps of the two Z-B dimers are not cooperative (10.7554/eLife.77352 and 10.1016/J.CELREP.2019.12.006). The authors' claim was based on the eviction assay shown in Fig 1C. However, I am not sure how much variation in the eviction assay is contributed by different preparations of nucleosomes. The authors should use the in-gel assay to independently test this hypothesis.

      For all data shown in our manuscript, at least three different nucleosome preparations were used. The impact of a ZB dimer on the rates of dimer exchange was highly reproducible among different nucleosome preparations and experiments. We also see reproducible ZB stimulation for three different substrates – with ZB on the linker proximal side, the linker distal side, and on one side of a core particle. We do not believe that our data are inconsistent with previous studies. First, the previous work referenced by the reviewer performed dimer exchange reactions with a large excess of nucleosomes to SWR1C (catalytic conditions), whereas we used single turnover reactions. Secondly, our study is the first to use a homogenous, ZA heterotypic nucleosome as a substrate for SWR1C. All previous studies used a standard AA nucleosome, following the first and second rounds of dimer exchange that occur sequentially. And finally, we observe only a 20-30% increase in rate by a ZB dimer (e.g. 77N0 substrates), and such an increase was unlikely to have been detected by previous gel-based assays.

      Minor comments:

      (1) Abstract line 4: To say 'Numerous' studies have shown acidic patch impact chromatin remodeling enzymes activity may be too strong.

      Removed

      (2) Page 15, line 15: The authors claim that swc5∆ was inviable on formamide media. However, the data in Figure 8 shows cell growth in column 1 of swc5∆.

      The term ‘inviable’ has been replaced with ‘poor’ or ‘slow growth’

      (3) The authors should use standard yeast nomenclature when describing yeast genes and proteins. For example, for Figure 8 and legend, Swc5∆ was used to describe the yeast strain BY4741; MATa; his3Δ1; leu2Δ0; met15Δ0; ura3Δ0; YBR231c::kanMX4. Instead, the authors should describe the swc5∆ mutant strain as BY4741 MAT a his3∆1 leu2∆0 met15∆0 ura3∆0 swc5∆::kanMX4. Exogenous plasmid should also be indicated in italics and inside brackets, such as [SWC5-URA3] or [swc5(R219A)-URA3].

      We apologize for missing this mistake in the Figure 8 legend. We had inadvertently copied this from the euroscarf entry and forgot to edit the entry. We decided not to add all the plasmid names to the figure, as it was too cluttered. We state in the figure legend that the panels show growth of swc5 deletion strains harboring the indicated swc5 alleles on CEN/ARS plasmids.

      (4) According to Lin et al. 2017 NAR (doi: 10.1093/nar/gkx414), there is only one Swc5 subunit per SWR1C. Therefore, the pincher model proposed by the authors would suggest that there is a missing subunit that recognizes the second acidic patch. The authors should point out this fact in the discussion. However, as mentioned in Major comment 6, I am not sure if the pincer model is substantiated.

      In our discussion, we had noted that the published cryoEM structure had suggested that the Swc2 subunit likely interacts with the acidic patch on the dimer that is not targeted for replacement, and we proposed that Swc5 interacts with the acidic patch on the exchanging H2A/H2B dimer. We have now made this more clear in the text.

    2. eLife assessment

      This manuscript presents an important analysis of the role that the nucleosome acidic patch plays in SWR1-catalyzed histone exchange. This manuscript contains convincing data which significantly expands our understanding of the complex process of H2A.Z deposition by SWR1 and therefore would be of interest to a broad readership. The manuscript would benefit from addressing previous models in the field, specifically regarding the insertion of the second dimer of H2A.Z/H2B; and the involvement of the acidic patch recognition by SWR1. These points should be addressed more directly with additional data.

    3. Reviewer #1 (Public Review):

      This manuscript presents an extremely exciting and very timely analysis of the role that the nucleosome acidic patch plays in SWR1-catalyzed histone exchange. Intriguingly, SWR1 loses activity almost completely if any of the acidic patches are absent. To my knowledge, this makes SWR1 the first remodeler with such a unique and pronounced requirement for the acidic patch. The authors demonstrate that SWR1 affinity is dramatically reduced if at least one of the acidic patches is absent, pointing to a key role of the acidic patch in SWR1 binding to the nucleosome. The authors also pinpoint a specific subunit - Swc5 - that can bind nucleosomes, engage the acidic patch, and obtain a cryo-EM structure of Swc5 bound to a nucleosome. They also identify a conserved arginine-rich motif in this subunit that is critical for nucleosome binding and histone exchange in vitro and for SWR1 function in vivo. The authors provide evidence that suggests a direct interaction between this motif and the acidic patch.

      Strengths:<br /> The manuscript is well-written and the experimental data are of outstanding quality and importance for the field. This manuscript significantly expands our understanding of the fundamentally important and complex process of H2A.Z deposition by SWR1 and would be of great interest to a broad readership.

    4. Reviewer #2 (Public Review):

      Summary:<br /> In this study, Baier et al. investigated the mechanism by which SWR1C recognizes nucleosomal substrates for the deposition of H2A.Z. Their data convincingly demonstrate that the nucleosome's acidic patch plays a crucial role in the substrate recognition by SWR1C. The authors presented clear evidence showing that Swc5 is a pivotal subunit involved in the interaction between SWR1C and the acidic patch. They pared down the specific region within Swc5 responsible for this interaction. However, two central assertions of the paper are less convincing. First, the data supporting the claim that the insertion of one Z-B dimer into the canonical nucleosome can stimulate SWR1C to insert the second Z-B dimer is somewhat questionable (see below). Given that this claim contradicts previous observations made by other groups, this hypothesis needs further testing to eliminate potential artifacts. Secondly, the claim that SWR1C simultaneously recognizes the acidic patch on both sides of the nucleosome also needs further investigation, as the assay used to establish this claim lacks the sensitivity necessary to distinguish any difference between nucleosomal substrates containing one or two intact acidic patches.

      Strengths:<br /> As mentioned in the summary, the authors presented clear evidence demonstrating the role of Swc5 in recognition of the nucleosome acidic patch. The identification of the specific region in Swc5 responsible for this interaction is important.

      Weaknesses:

      Major comments:

      (1) Figure 1B: It is unclear how much of the decrease in FRET is caused by the bleaching of fluorophores. The authors should include a negative control in which Z-B dimers are omitted from the reaction. In the absence of ZB dimers, SWR1C will not exchange histones. Therefore, any decrease in FRET should represent the bleaching of fluorophores on the nucleosomal substrate, allowing normalization of the FRET signal related to A-B eviction.

      (2) Figure S3: The authors use the decrease in FRET signal as a metric of histone eviction. However, Figure S3 suggests that the FRET signal decrease could be due to DNA unwrapping. Histone exchange should not occur when SWR1C is incubated with AMP-PNP, as histone exchange requires ATP hydrolysis (10.7554/eLife.77352). And since the insertion of Z-B dimer and the eviction of A-B dimer are coupled, the decrease of FRET in the presence of AMP-PNP is unlikely due to histone eviction or exchange. Instead, the FRET decrease is likely due to DNA unwrapping (10.7554/eLife.77352). The authors should explicitly state what the loss of FRET means.

      (3) Related to point 2. One way to distinguish nucleosomal DNA unwrapping from histone dimer eviction is that unwrapping is reversible, whereas A-B eviction is not. Therefore, if the authors remove AMP-PNP from the reaction chamber and a FRET signal reappears, then the initial loss of FRET was due to reversible DNA unwrapping. However, if the removal of AMP-PNP did not regain FRET, it means that the loss of FRET was likely due to A-B eviction. The authors should perform an AMP-PNP and/or ATP removal experiment to make sure the interpretation of the data is correct.

      (4) The nature of the error bars in Figure 1C is undefined; therefore, the statistical significance of the data is not interpretable.

      (5) The authors claim that the SWR1C requires intact acidic patches on both sides of the nucleosomes to exchange histone. This claim was based on the experiment in Figure 1C where they showed mutation of one of two acidic patches in the nucleosomal substrate is sufficient to inhibit SWR1C-mediated histone exchange activity. However, one could argue that the sensitivity of this assay is too low to distinguish any difference between nucleosomes with one (i.e., AB/AB-apm) versus two mutated acidic patches (i.e., AB-apm/AB-apm). The lack of sensitivity of the eviction assay can be seen when Figure 1B is taken into consideration. In the gel-shift assay, the AB-apm/AB-apm nucleosome exhibited a 10% SWR1C-mediated histone exchange activity compared to WT. However, in the eviction assay, the single AB/AB-apm mutant has no detectable activity. Therefore, to test their hypothesis, the authors should use the more sensitive in-gel histone exchange assay to see if the single AB/AB-apm mutant is more or equally active compared to the double AB-apm/AB-apm mutant.

      (6) The authors claim that the AZ nucleosome is a better substrate than the AA nucleosome. This is a surprising result as previous studies showed that the two insertion steps of the two Z-B dimers are not cooperative (10.7554/eLife.77352 and 10.1016/J.CELREP.2019.12.006). The authors' claim was based on the eviction assay shown in Fig 1C. However, I am not sure how much variation in the eviction assay is contributed by different preparations of nucleosomes. The authors should use the in-gel assay to independently test this hypothesis.

      Minor comments:

      (1) Abstract line 4: To say 'Numerous' studies have shown acidic patch impact chromatin remodeling enzymes activity may be too strong.

      (2) Page 15, line 15: The authors claim that swc5∆ was inviable on formamide media. However, the data in Figure 8 shows cell growth in column 1 of swc5∆.

      (3) The authors should use standard yeast nomenclature when describing yeast genes and proteins. For example, for Figure 8 and legend, Swc5∆ was used to describe the yeast strain BY4741; MATa; his3Δ1; leu2Δ0; met15Δ0; ura3Δ0; YBR231c::kanMX4. Instead, the authors should describe the swc5∆ mutant strain as BY4741 MAT a his3∆1 leu2∆0 met15∆0 ura3∆0 swc5∆::kanMX4. Exogenous plasmid should also be indicated in italics and inside brackets, such as [SWC5-URA3] or [swc5(R219A)-URA3].

      (4) According to Lin et al. 2017 NAR (doi: 10.1093/nar/gkx414), there is only one Swc5 subunit per SWR1C. Therefore, the pincher model proposed by the authors would suggest that there is a missing subunit that recognizes the second acidic patch. The authors should point out this fact in the discussion. However, as mentioned in Major comment 6, I am not sure if the pincer model is substantiated.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important work by Park et al. introduces an open-top two-photon light sheet microscopy (OT-TP-LSM) for lesser invasive evaluation of intraoperative 3D pathology. The authors provide convincing evidence for the effectiveness of this technique in investigating various human cancer cells. The paper needs some minor corrections and has the potential to be of broad interest to biologists and, specifically, pathologists utilizing 3D optical microscopy.

      We would like to thank the editor for the positive general comment. We revised the manuscript by addressing the reviewers' comments.

      Public Reviews:

      Reviewer1

      Summary:

      A2. This manuscript presents the development of a new microscope method termed "open-top two-photon light sheet microscopy (OT-TP-LSM)". While the key aspects of the new approach (open-top LSM and Two-photon microscopy) have been demonstrated separately, this is the first system of integrating the two. The integration provides better imaging depth than a single-photon excitation OT-LSM.

      Strengths:

      The use of liquid prism to minimize the aberration induced by index mismatching is interesting and potentially helpful to other researchers in the field.

      • The use of propidium iodide (PI) provided a deeper imaging depth.

      Weaknesses:

      Details are lacking on imaging time, data size, the processing time to generate large-area en face images, and inference time to generate pseudo H&E images. This makes it difficult to assess how applicable the new microscope approach might be in various pathology applications.

      B2. We would like to thank the reviewer for the critical and positive comments. We agree with the reviewer that detailed information such as processing time is missing.

      The imaging time and data size were estimated per 1cm2 area and they were 7 min and 318 GB (= (7 × 60) s × 400 fps × (1850 × 512 × 2) byte) for each channel, respectively. The time for processing en-face images was relatively long by taking ~1.7 s Gb−1 after loading the image dataset at ~6.8 s Gb−1 in the current setting and needs to be shortened for intraoperative application. The time for converting OT-TP-LSM images of 512 x 512 pixels into virtual H&E staining images was 160 ms. This study was to address the current limitation of 3D pathology such as imaging depth and to develop the image processing to generate virtual H&E images. Further development such as speeding up the image processing would be needed. We added missing information and included some discussion on limitations of the new system and further development for intraoperative applications.

      C1-1. Revised manuscript, Discussion, pages 14-15 and lines 320-328

      Although OT-TP-LSM enabled high-speed 3D imaging, the post-processing time of the OT-TP-LSM image datasets was relatively long due to the large data size, sequential processing of dual channel images, and manual stitching. The long post-processing time needs to be resolved for intraoperative applications. To speed up processing, these processing steps can be performed using field-programmable gate array (FPGA)-based data acquisition with graphics processing unit (GPU)-based computing. The processing time can be further reduced by coding the algorithm in a C++-based environment. Furthermore, ImageJ-based software such as the Bigstitcher plugin can be used for automatic 3D image processing [44].

      C1-2. Revised manuscript, Materials and methods, Image acquisition and post-processing, page 17 and lines 390-398

      Image acquisition and post-processing

      Raw image datasets from dual sCMOS cameras were acquired and processed on a workstation with 128 Gb RAM and a 2 TB SSD drive. The imaging time and data size per 1cm2 area with 400 fps was 7 min and 318 GB (= (7 × 60) s × 400 fps × (1850 × 512 × 2) byte) for each channel, respectively. The raw image strip was sheared at 45° with respect to the sample surface, and a custom image processing algorithm was used to transform the image data in the XYZ coordinate. The processing for en-face image was conducted in MATLAB and took ~1.7 s Gb−1 after loading the image dataset at ~6.8 s Gb−1 in the current laboratory setting. Mosaic images were generated by joining the image strips manually.

      C1-3. Revised manuscript, Materials and methods, Virtual H&E staining of OT-TP-LSM via deep learning network, page 18 and lines 414-418

      The CycleGAN training and testing were performed using a Nvidia GeForce RTX 3090 with 24 GB RAM. The network was implemented using Python version 3.8.0 on a desktop computer with a Core i7-12700K CPU@3.61 GHz and 64 GB RAM, running Anaconda (version 22.9.0). The inference time for converting OT-TP-LSM patch image into virtual H&E patch image was measured as 160 ms.

      Reviewer 2

      Summary:

      A2. In this manuscript, the authors developed an open-top two-photon light sheet microscopy (OT-TP-LSM) that enables high-throughput and high-depth investigation of 3D cell structures. The data presented here shows that OT-T-LSM could be a complementary technique to traditional imaging workflows of human cancer cells.

      Strengths:

      High-speed and high-depth imaging of human cells in an open-top configuration is the main strength of the presented study. An extended depth of field of 180 µm in 0.9 µm thickness was achieved together with an acquisition of 0.24 mm2/s. This was confirmed by 3D visualization of human cancer cells in the skin, pancreas, and prostate.

      Weaknesses:

      The complementary aspect of the presented technique in human pathological samples is not convincingly presented. The traditional hematoxylin and eosin (H&E) staining is a well-established and widely used technique to detect human cancer cells. What would be the benefit of 3D cell visualization in an OT-TP-LSM microscope for cancer detection in addition to H&E staining?

      B2. We would like to thank the reviewer for the critical and positive comments. 3D pathology has been a long-standing research direction. The current pathology is 2D by examining H&E histology slides which were generated by thin sectioning biopsied and surgical specimens at different depths. The reliability of the pathological diagnosis suffers from under sampling of specimens. Although 3D pathology is possible by serial thin-sectioning, imaging, and then combining the images in 3D, it is not practice for clinical use due to the required labor and time.

      We demonstrated the advantages of OT-TP-LSM in various human cancer tissues. The relatively high imaging depths of OT-TP-LSM enabled the nondestructive visualization of detailed 3D cell structures with high contrast and without distortion and allowed a distinction between cancer and normal cell structures as well as the detection of cancer invasiveness within tissues. We revised the manuscript to explain the benefits of 3D pathology with OT-TP-LSM.

      C2-1. Revised manuscript, Results, 3D OT-TP-LSM imaging of human skin cancers, pages 8-9 and lines 176-180

      Using 3D visualization, normal glandular structures in the dermis were distinguished from BCC tumor nests (Video 1). Both eccrine and sebaceous glands could appear similar to BCC nests in 2D images at certain depths. Hence, nondestructive 3D visualization of cell structures would be important for distinguishing them, serving as a complement to the traditional 2D H&E images.

      C2-2. Revised manuscript, Results, 3D OT-TP-LSM imaging of human pancreatic cancers, pages 10-11 and lines 222-232

      Magnified images of ROI 1 (PDAC) at two different depths showed irregularly shaped glands with sharp angles and 3D structural complexity including unstable bridging structure inside (Figure 4B). An irregular and distorted architecture amidst desmoplastic stroma is one of the important diagnostic factors for PDAC [35]. The cancer glands exhibited disorganized cancer cell arrangement with nuclear membrane distortion. Magnified images of ROI 2 showed both nonneoplastic ducts and cancer glands in different cell arrangements (Figure 4C). The nonneoplastic ducts showed single-layered epithelium with small, evenly distributed cells expressing relatively high nuclear fluorescence. Cancer glands, on the other hand, had disorganized and multilayered structure with large nuclei. OT-TP-LSM visualized the 3D invasiveness of cancer glands within tissues nondestructively, which could not be identified from limited 2D information.

      C2-3. Revised manuscript, Results, 3D OT-TP-LSM imaging of human prostatic cancers, page 11 and lines 251-252

      OT-TP-LSM provided histological 3D information equivalent to that of the H&E stained image without the need for sectioning.

      C2-4. Revised manuscript, Discussion, page 12 and lines 274-276

      OT-TP-LSM was developed for the rapid and precise nondestructive 3D pathological examination of excised tissue specimens during both biopsy and surgery, as a compliment to traditional 2D H&E pathology by visualizing 3D cell structures.

      C2-5. Revised manuscript, Discussion, page 13 and lines 284-288

      The relatively high imaging depths of OT-TP-LSM enabled the nondestructive visualization of detailed 3D cell structures with high contrast and without distortion and allowed a distinction between cancer and normal cell structures as well as the detection of cancer invasiveness within tissues. These have been challenging with 2D histological images.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest the following points to the authors to enhance the readability of the manuscript and to provide a strong narrative to explain their findings:

      A3. Line 54: For the non-expert readers, please provide more background information about the histopathology before introducing the hematoxylin and eosin staining.

      B3. We would like to thank the reviewer for the comment. As suggested by the reviewer, we added information about the current standard method of histopathological examination and its limitations.

      C3. Revised manuscript, introduction, page 4 and lines 56-64 Precise intraoperative cancer diagnosis is crucial for achieving optimal patient outcomes by enabling complete tumor removal. The standard method is the microscopic cellular examination of surgically excised specimens following various processing steps, including thin sectioning and hematoxylin and eosin (H&E) cell staining. However, this examination method is laborious and time-consuming. Furthermore, it has inherent artifacts that disturb accurate diagnosis, including tissue loss, limited two-dimensional (2D) information, and sampling error [1]. High-speed three-dimensional (3D) optical microscopy, which can visualize cellular structures without thin sectioning, holds promise for nondestructive 3D pathological examination as a complement of 2D pathology limitation [1-4].

      A4. Line 66 and 71: Please briefly introduce the cited studies to give some information about the previous studies. This will help to reader to understand the innovative aspects of your study.

      B4. We would like to thank the reviewer for the comment. As suggested by the reviewer, we added a brief introduction about the cited studies.

      C4. Revised manuscript, introduction, pages 4-5 and lines 71-82

      As a deep tissue imaging method, two-photon microscopy (TPM) has been used in both biological and optical biopsy studies [17-19]. TPM is based on nonlinear two-photon excitation of fluorophores and achieves high imaging depths down to a few hundred micrometers by using long excitation wavelengths, which reduce light scattering. Moreover, TPM provides additional intrinsic second harmonic generation (SHG) contrast for visualizing collagen fibers within the extracellular matrix (ECM). This feature proved advantageous for high-contrast imaging of cancer tissue and microenvironmental analysis [20-22]. However, TPM has low imaging speeds due to point scanning-based imaging. To address this limitation, two-photon LSM (TP-LSM) techniques were developed for high-speed imaging [23-27]. Although TP-LSM facilitated rapid 3D imaging of cancer cells and zebrafish, its applications were limited to small samples and biological studies due to geometric limitations.

      A5. Line 72: Please mention the importance and benefit of having an open-top configuration. I think this is one of the key aspects that provide a high imaging depth in OT-LP-LSM.

      B5. We would like to thank the reviewer for the comment. Conventional LSM techniques including TP-LSM have a configuration in which the illumination objective is oriented in the horizontal plane and imaging is performed with orthogonally arranged objectives. However, this geometry limited lateral sample size physically and it is unsuitable to image centimeter-scale large tissue. Therefore, we developed OT-TP-LSM for 3D large tissue examination. High imaging depths were achieved with long excitation wavelengths and long emission wavelengths of fluorophores. The open-top configuration does not contribute to the improvement of imaging depth. We revised the manuscript to explain the need for open-top configuration.

      C5. Revised manuscript, introduction, page 5 and lines 82-86

      Conventional TP-LSM had a configuration of a horizontally oriented illumination objective and a vertically oriented imaging objective. This geometry imposed limitations on the sample size, rendering it unsuitable for the examination of centimeter-scale specimens. TP-LSM with open-top configuration is needed for 3D histological examination.

      A6. Line 78: It would be nice to clearly quantify the imaging depth here.

      B6. We would like to thank the reviewer for the comment. Although we considered entering the quantitative imaging depth of OT-TP-LSM in the introduction section, we decided that it would be appropriate to present the quantitative imaging depth in the Results section and discuss it in the Discussion section.

      A7. Line 146: Please clearly explain the reason why the upper layers are not resolved.

      B7. We would like to thank the reviewer for the comment and we are sorry for the missing information. The skin epidermis has various cell layers and superficial layers are composed of less rounded and flat cells with relatively small cytoplasm. Therefore, cells in that layer could be difficult to resolve with the current system resolution because there is little space between nuclei. Additionally, strong autofluorescence signal in the stratum corneum could be the reason for preventing visualization of the cells in the superficial layer. We revised the manuscript to explain the reasons in detail.

      C7. Revised manuscript, Results, 3D OT-TP-LSM imaging of human skin cancers, page 8 and lines 159-163

      Keratinocytes in the basal layer were relatively large and individually resolved, while those in the upper layers were unresolved and appeared as a band. It could be attributed to the upper layers being comprised of flat cells with relatively small cytoplasm, resulting in little space between nuclei. Additionally, strong autofluorescence signal in the stratum corneum might prevent visualization of the cells in the superficial layer.

      A8. Line 253: Please explain the importance of visualization of 3D cell structures in cancer pathology. I think this should be stated clearly throughout the text as it is the key component of OT-LP-LSM to complement the traditional H&E staining. Also, referring to the non-destructive manner of your technique would help to emphasize this point.

      B8. We would like to thank the reviewer for the comment. As answered in A2, the current H&E histological examination has inherent limitations due to limited 2D information and sampling errors. To resolve this, OT-TP-LSM was developed for the visualization of 3D cell structures nondestructively as a complement to traditional slide-based 2D pathology. We demonstrated the advantages of OT-TP-LSM in various human cancer tissues. The relatively high imaging depths of OT-TP-LSM enabled the nondestructive visualization of detailed 3D cell structures with high contrast and without distortion and allowed a distinction between cancer and normal cell structures as well as the detection of cancer invasiveness within tissues. We revised the manuscript to explain the benefits of 3D pathology with OT-TP-LSM.

      C8. Please refer to the answer in C2-1 – C2-5.

      A9. Figures: Please clearly mark the cancer regions in the images as indicated in Figure 5. It will help the reader to easily compare the healthy and invaded tissue parts.

      B9. We would like to thank the reviewer for the comment. We confirmed that the cancer area is not marked in Figure 4 of the pancreatic cancer tissue. We modified Figure 4 to mark the cancer region. Additionally, Figure 2 of the skin cancer tissue was also modified in this regard.

      C9. Modified Figure 2 and Figure 4.

      Author response image 1.

      Author response image 2.

    2. eLife assessment

      This important work by Park et al. demonstrates an open-top two-photon light sheet microscopy (OT-TP-LSM) for lesser invasive evaluation of intraoperative 3D pathology. The authors provide convincing evidence for the effectiveness of this technique investigating various human cancer cells. This article will be of broad interest to biologists and, specifically, pathologists utilizing 3D optical microscopy.

    3. Reviewer #1 (Public Review):

      Summary:

      This manuscript presents the development of a new microscope method termed "open-top two-photon light sheet microscopy (OT-TP-LSM)". While the key aspects of the new approach (open top LSM and Two-photon microscopy) have been demonstrated separately, this is the first system of integrating the two. The integration provides better imaging depth than a single-photon excitation OT-LSM.

      Strengths:<br /> - Use of liquid prism to minimize the aberration induced by index mismatching is interesting and potentially helpful to other researchers in the field.<br /> - Use of propidium iodide (PI) provided a deeper imaging depth.

      Weaknesses:<br /> -None noted.

    4. Reviewer #2 (Public Review):

      In this manuscript, the authors developed an open-top two-photon light sheet microscopy (OT-TP-LSM) that enables high-throughput and high-depth investigation of 3D cell structures. The data presented here shows that OT-T-LSM could be a complementary technique to traditional imaging workflows of human cancer cells.

      High-speed and high-depth imaging of human cells in an open-top configuration is the main strength of the presented study. An extended depth of field of 180 µm in 0.9 µm thickness was achieved together with an acquisition of 0.24 mm2/s. This was confirmed by 3D visualization of human cancer cells in the skin, pancreas, and prostate.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The paper addresses the important question of how numerical information is represented in the human brain. Experimental findings are interpreted as providing evidence for a sensorimotor mechanism that involves channels, each tuned to a particular numerical range. However, the logic of the channel concept as employed here, as well as the claims regarding a sensorimotor basis for these channels, is incomplete and thus requires clarification and/or modification.

      Reviewer #1 Public Review

      Anobile and colleagues present a manuscript detailing an account of numerosity processing with an appeal to a two-channel model. Specifically, the authors propose that the perception of numerosity relies on (at least) two distinct channels for small and large numerosities, which should be evident in subject reports of perceived numerosity. To do this, the authors had subjects reproduce visual dot arrays of numerosities ranging from 8 to 32 dots, by having subjects repetitively press a response key at a pre-instructed rate (fast or slow) until the number of presses equaled the number of perceived dots. The subjects performed the task remarkably well, yet with a general bias to overestimate the number of presented dots. Further, no difference was observed in the precision of responses across numerosities, providing evidence for a scalar system. No differences between fast and slow tapping were observed. For behavioral analysis, the authors examined correlations between the Weber fractions for all presented numerosities. Here, it was found that the precision at each numerosity was similar to that at neighboring numerosities, but less similar to more distant ones. The authors then went on to conduct PCA and clustering analyses on the weber fractions, finding that the first two components exhibited an interaction with the presented numerosity, such that each was dominant at distinct lower and upper ranges and further well-fit by a log-Gaussian model consistent with the channel explanation proposed at the beginning.

      Overall, the authors provide compelling evidence for a two-channel system supporting numerosity processing that is instantiated in sensorimotor processes. A strength of the presented work is the principled approach the authors took to identify mechanisms, as well as the controls put in place to ensure adequate data for analysis. Some questions do remain in the data, and there are aspects of the presentation that could be adjusted.

      • The use of a binary colormap for the correlation matrix seems unnecessary. Binary colormaps between two opposing colors (with white in the middle) are best for results spanning positive and negative values (say, correlation values between -1 and +1), but the correlations here are all positive, so a uniform colormap should be applied. I can appreciate that the authors were trying to emphasize that a 2+ channel system would lead to lower correlations at larger ratios, but that's emphasized better in the numerical ratio line plots.

      We agree and now changed the colour maps accordingly (Fig 1 and 3, p. 4 and 11). Thank you.

      • In Figure 1, the correlation matrices in Figure 1 appear blurred out. I am not sure if this was intentional but suspect it was not, and so they should appear like those presented in Figure 3.

      Sorry about that, it was a rendering problem. Now fixed.

      • It's notable that the authors also collected data on a timing task to rule out a duration-based strategy in the numerosity task. If possible, it would be great to have the author also conduct the rest of the analyses on the duration task as well; that is, to look at WF correlation matrices/ratios as well as PCA. There is evidence that duration processing is also distinctly sensorimotor, and may also rely on similar channels. Evidence either for or against this would likely be of great interest.

      We agree that investigating the existence of temporal channels would be of great interest, but it is goes beyond the scope of the current study. Out of curiosity, however, we analysed the duration data. Interestingly, signatures of sensorimotor channels (correlation gradient as a function on duration distance) emerge. Interestingly, this does not hold when correlating number against duration data. These results (if confirmed) would indicate the existence of independent mechanisms for the time and numerosity perception. Our research agenda is now proceeding in this direction.

      • For the duration task, there was no fast-tapping condition. Why not? Was this to keep the overall task length short?

      Yes, this was the main reason.

      • The number of subjects/trials seems a bit odd. Why did some subjects perform both and not others? The targets say they were presented "between 25 and 30 times", but why was this variable at all?

      The two experimental conditions were demanding, lasting around 2 hours each. Some participants, unfortunately, were available for just one slot. To make the two conditions similarly powered, we added some extra non-shared participants. Trials were divided into blocks of 55 trials (5 repetitions for each target). Most of the participants performed 6 blocks in both conditions, few of them (again for availability limits) performed 5 blocks.

      • For the PCA analysis, my read of the methods and results is that this was done on all the data, across subjects. If the data were run on individual subjects and the resulting PCA components averaged, would the same results be found?

      We thank the reviewer for giving us the opportunity to clarify the technique.

      In brief: we measured precision (Weber Fraction) in translating digits (target numbers) into corresponding action sequences. This creates a m by n matrix, each column (n) representing a participant, each row (m) a target number. This matrix was then submitted to PCA. The analyses provided two components. Each target number was assigned with two loading scores: one representing the loading on the 1st and one on the 2nd component. These loadings were than displayed as a function of targets, to describe the tunings. This analysis, by its nature, is across-participants and cannot be performed on individual data.

      • For the data presented in Figure 2, it would be helpful to also see individual subject data underlaid on the plots to get a sense of individual differences. For the reproduced number, these will likely be clustered together given how small the error bars are, but for the WF data it may show how consistently "flat" the data are. Indeed, in other magnitude reproduction tasks, it is not uncommon to see the WF decrease as a function of target magnitude (or even increase). It may be possible that the reason for the observed findings is that some subjects get more variable (higher WFs) with larger target numbers and others get less variable (lower WFs).

      We agree and now added individual data, confirming flat WF distributions (Fig 2 B&D).

      • Regarding the two-channel model, I wonder how much the results would translate to different ranges of numerosities? For example, are the two channels supported here specific to these ranges of low and high numbers, or would there be a re-mapping to a higher range (say, 32 to 64 dots) or to a narrower range (say 16 to 32 dots). It would be helpful to know if there is any evidence for this kind of remapping.

      This is the first study measuring sensorimotor channels for the transformation of numbers into action sequences. Whether these channels are modulated by the numerical context is an interesting open question that we are exploring through specific experimental conditions (now discussed at p. 17, lines 451-460).

      Reviewer #2 Public Review

      The authors wish to apply established psychophysical methods to the study of number. Specifically, they wish to test the hypothesis - supported by their previous work - that human sensorimotor processes are tuned to specific number ranges. In a novel set of tasks, they ask participants to tap a button N times (either fast or slow), where N varies between 8 and 32 across trials. As I understood it, they then computed the Weber fraction (WF) for each participant for each number and correlated those values across participants and numbers. They find stronger correlations for nearby numbers than for distant numbers and interpret this as evidence of sensorimotor tuning functions. Two other analyses - cluster analyses and principal component analyses (PCA) - suggest that participants' performance relied on at least 2 mechanisms, one for encoding low numbers of taps (around 10) and another for encoding larger numbers (around 27).

      Strengths

      Individual differences can be a rich source of scientific insight and I applaud the authors for taking them seriously, and for exploring new avenues in the study of numerical cognition.

      Weaknesses

      Inter-subject-correlation

      The experiment "is based on the idea that interindividual variability conveys information that can reveal common sensory processes (Peterzell & Kennedy, 2016)" but I struggled to understand the logic of this technique. The authors explain it most clearly when they write "Regions of high intercorrelation between neighbouring stimuli intensity can be interpreted to imply that sets of stimuli are processed by the same (shared) underlying channel. This channel, while responding relatively more to its preferred stimulus, will also be activated by neighbouring stimuli that although slightly different from the preferred intensity, are nevertheless included in the same response distribution." As I understood it, the correlations are performed "between participants, for all targets values" - meaning that they are measuring the extent to which different participants' WFs vary together. But why is this a good measure of channels? This analysis seems to assume that if people have channels for numerical estimation, they will have the same channels, tuned to the same numerical ranges. But this is an empirical question - individual participants could have wildly different channels, and perhaps different numbers of channels (even in the tested range). If they do, then this between-subject analysis would mask these individual differences (despite the subtitle).

      Yes, the technique assumes that different individuals have similar channels, and the results confirm this. If everyone had different channels, or different numbers of channels, we would not have found this pattern of results: an ordered scaling of correlations as a function of numerical distance. As specified in the ms, however, this technique (at least as we used it) is not sensitive enough to identify the exact number of channels, so it may have smoothed the results, 'masking' the existence of more than two channels. To avoid possible confounds related to accuracy (reproduction biases), we used Weber Fraction, a standard index of normalized sensory precision (p. 7, lines 182-183).

      Different channels

      I had trouble understanding much of the analyses, and this may account for at least some of my confusion. That said, as I understand it, the results are meant to provide "evidence that tuned mechanisms exist in the human brain, with at least two different tunings" because of the results of the clustering analysis and PCA. However, as the authors acknowledge, "PCA aims to summarize the dataset with the minimal number of components (channels). We can therefore not exclude the possible existence of more than two (perhaps not fully independent) channels." So I believe this technique does not provide more evidence for the existence of 2 channels as for the existence of 4 or 8 or 11 channels, the upper bound for a task testing 11 different numbers. If we can conclude that people may have one channel per number, what does "channel" mean?

      We recognise that the technique is not particularly intuitive, and we apologize for the lack of clarity.

      To clarify: we measured the precision in translating digit numbers into action sequences. This was done for different target numbers (8, 10, 11, 13, 14, 16, 19, 21, 24, 28, 32) and with N participants. For each target number, and independently for each participant, we calculated the reproduction precision (Weber Fraction). The dataset comprised a matrix where each column represents a participant, and each row a target number. Each cell contains the corresponding Weber Fraction value. This dataset was then analysed with a simple correlation, across participants. For example, the WFs provided by the N participants when tested at the target number "8" were correlated with those obtained with the target number 10, 11, 13...32. The results show that the correlation between "8 and 10" (low numerical distance) was higher compared to that obtained correlating "8 with 32" (higher numerical distance). This pattern implies that the shared variance, between numbers, scales with numerical distance, across participants: implying the existence of channels aggregating similar numbers (i.e. tuning selectivity). On the same dataset we than ran a PCA. This analysis provides two main components. Within each component, each target number is assigned with a loading score: one for the 1st and one for the 2nd component. These loading were plotted as a function of targets, to describe the tunings shape (i.e. channels).

      As stated above, we cannot really say exactly how many channels exist. These results should be interpreted as evidence for the existence of at least two channels for the transformation of numerical symbols into action sequences. This is not an obvious result at-all. There is no evidence in the literature for the existence of such mechanism in humans. In the animal (crow), there were found as many channels as the numbers tested. This does not contrast with our 2-channel results, but (very likely) arises from the different resolution of the techniques. Single cell recording has surely higher resolution compared to our interindividual covariance approach. In short, we believe that the channels revealed here are likely a coarse summary representation of several underlying channels.

      We now tried to make these points clearer (p. 7 lines 186-196; p. 15 lines 382-384; p. 16 lines 401-402):

      Several other questions arose for me when thinking through this technique. If people did have two channels (at least in this range), why would they be so broad? Why would they be centered so near the ends of the tested range? Can such effects be explained by binning on the part of the participants, who might have categorized each number (knowingly or not) as either "small" or "large"? Whereas the experiment tested numbers 8-32, numbers are infinite - How could a small number of channels cover an infinite set? Or even the set 8-10,000? More broadly, I was unsure what advantages channels would have - that is - how in principle would having distinct channels for processing similar stimuli improve (rather than impede) discrimination abilities?

      This field of study is completely new, with many questions still open, including whether these channels are modulated by the numerical context such as the tested range and their extremes. The channels appear broad because, as stated above, they likely represent a coarse summary representation of several (probably sharper) underlying channels. We are now exploring the effect of numerical range and trying to modulate the tuning widths through ad-hoc experimental conditions. (p. 16 lines 401-402; p. 17 lines 450-459)

      No number perception

      I was uncertain about the analogy to studies of other continuous dimensions like spatial frequency, motion, and color. In those studies, participants view images with different spatial frequency, motion, or color - the analogy would be to see dot arrays containing different numbers of dots. Instead, here participants read written numerals (like "19"), symbols which themselves do not have any numerical properties to perceive. How does that difference change the interpretation of the effects? One disadvantage of using numerals is that they introduce a clear discontinuity: Our base-10 numerical system artificially chunks integers into decades, potentially causing category-boundary effects in people's reproductions.

      We used these sensory analogies to provide a flavour of the technique. The focus of the current study was on the individual differences in the numbers-to-actions transformation process. To this aim we decided to reduce the noise associated with the encoding of the sensory stimulus di per se. Digits encoding, at least with educated adults, is indeed noiseless, eliminating this source of variability. However, we agree that looking at non-symbolic formats would be interesting. We are now collecting data with dots and flash estimations. The results (so far) are largely in line with those found here, ensuring no chunking strategies, and confirming previous literature showing sensory numerosity selective channels in humans and animals. (p. 14 lines 351-355)

      Sensorimotor

      The authors wished to test for "sensorimotor mechanisms selective to numerosity" but it's not clear what makes their effects sensorimotor (or selective to numerosity, see below). It's true they found effects using a tapping task (which like all behaviour is sensorimotor), but it's not clear that this effect is specific to sensorimotor number reproduction. They might find similar effects for numerical comparison or estimation tasks. Such findings would suggest the effect may be a general feature of numerical cognition across modalities.

      Related to the above comment, the task here was to transform noiseless symbols (digits) into (noisy) numerical action sequences. Given that the source of variability is thus mainly driven by the sensory-to-action process, we believe that the task can be safely assumed to be considered sensorimotor in nature. (p. 14 lines 351-355)

      Yes, the same pattern of results might be found for numerical comparison or estimation tasks, but using non-symbolic formats (dots/flashes). Educated adults make no errors in naming or comparing such simple digits, making this covariance analysis impossible to be performed with digit verbal estimation or comparison tasks. However, to anticipate our future results, we have preliminary data for dots and flashes verbal estimation tasks (“how many?”). The data are suggesting similar results, consolidating the technique, and confirming the large literature showing sensory channels for purely visual numerosity. (p. 17 lines 453-455)

      Specific to numbers

      The authors argue that their effects are "number selective" but they do not provide compelling evidence for this selectivity. In principle, their main findings could be explained by the duration of tapping rather than the number of taps. They argue this is unlikely for two reasons. The first reason is that the overall pattern of results was unchanged across the fast and slow tapping conditions, but differences in duration were confounded with numerosity in both conditions, so the comparison is uninformative. (Given this, I am not sure what we stand to learn by comparing the two tapping speeds.) The second reason is that temporal reproduction was less precise in their control condition than numerical reproduction, but this logic is unclear: Participants could still use duration (or some combination of speed and duration) as a helpful cue to numerosity, even if their duration reproductions were imperfect. If the authors wish to test the role of duration, they might consider applying the same analytical techniques they use for numbers to their duration data. Perhaps participants show similar evidence for duration-selective channels, in the absence of number, as they do for other non-numerical domains (like spatial frequency).

      The fast and slow conditions were not meant to control for duration strategies but to test for the generalizability of the results over different tapping temporal dynamics (temporal frequency in this case). The results confirmed this.

      The control for duration strategies is the comparison between precision in reproducing durations or numbers. In the number-to-action task, participants were free to use any cues, including response duration. However, it is safe to assume that the performance is dominated by the most precise feature, number in this case. In other words, in the number task if participants were reproducing the time required to give a certain number of presses, then in the timing task, where they are explicitly reproducing the same durations, they should show no lower precision. The results are opposite to that prediction. (p. 16 lines 418-420)

      Theories of numerical cognition.

      An expansive literature on numerical cognition suggests that many animals, human children, and adults across cultures have two systems for representing numerosity without counting - one that can represent the exact cardinality of sets smaller than about 4 and another that represents the approximate number of larger sets (but see Cheyette & Piantadosi, 2020). The current paper would benefit from better relating its findings to this long lineage of theories and findings in numerical approximation across cultures, ages, and species.

      The numbers used in this work were well above the subitizing limit (>N7). Indeed, the WFs found showed no signs of subitizing discontinuities. We believe that discussing the literature on subitizing here is too far from the scope of the current work.

      Additional public comments from the Reviewing Editor:

      (1) What, in the present work, makes the case that the operative mechanism is sensorimotor? The authors frame the discussion around a sensorimotor number system but the evidence here could be seen as using a sensorimotor task as one way to get at an amodal number channel. For example, the authors could do the same experiment but have people watch a circle that flashes on and off for n times, with participants reporting the number of flashes (or shown a number and asked to say more or less). They could then apply the same analyses as used here. If they got the same results, it would seem that this would be an argument against the channels being sensorimotor. I suppose if they did NOT get results in the perceptual task, then they would have (much) stronger evidence that the channels are somehow sensorimotor in nature. Either way, an experiment along these lines would be essential for addressing the nature of the channels (tied to sensorimotor or not).

      We chose to use this task because the perception of simple digits (like those used here), at least in educated adults, is noiseless. This ensures that the inter-individual variability remaining on the table is that related to the motor transformation process. For this reason, we believe that the task can be safely considered sensorimotor (see also Kirschhock & Nieder, Number selective sensorimotor neurons in the crow translate perceived numerosity into number of actions, Nature comm, 2022). (p. 14 lines 351-355)

      This is not true for verbal numerosity estimation of non-symbolic stimuli (such as dots and/or series of events). It is well known that the estimation of the latter stimuli is noisy, and there would be no sensorimotor transformation processing in the task. The inter-individual variability in estimation precision and thus the measurable channels would then reflect sensory numerosity tunings. These have been revealed with various techniques in both humans and animals. However, we are now following this idea and we have preliminary data showing that sensory channels are also detectable by the technique used in the current study. This in not in contrast with the sensorimotor nature of the channels found here, but instead indicating the existence of both sensory and sensorimotor number channels.

      The authors may argue that results from other studies such as the 2016 target article make the case about a sensorimotor basis of these channels. While I don't have a great grasp of this literature, my take on the 2016 target article is that the point was not about sensorimotor channels but about interactions between action and vision. This seems more in line with the idea of amodal number channels and indeed, they speak about a "generalized number sense" in that paper.

      The 2016 paper showed that a short period of hand tapping (adaptation) can distort visual numerosity perception. The results implied the existence of sensorimotor number channels, integrating non-symbolic numerosity (dots/flashes) and actions. The current study goes beyond this, describing (for the first time) sensorimotor channels transforming symbolic numbers into action sequences. Whether these channels are also in charge to encode non-symbolic numerosity is an interesting open question that we are currently investigating with cross-tasks analyses. If the same channels are in charge to respond to non-symbolic numerosity (across space and time: dots and sequences of visual/auditory events) as well as to translate digits into actions, we could than speck about a generalized sensorimotor number sense. At present, this remains a possibility, to be tested. (p. 17 lines 450-459)

      (2) There is a need for clarification on the method for creating the correlation matrices. The authors write that they look at correlations between Weber fractions between participants. By "between" do they mean "across"? That is, they calculate the Weber fraction for each individual for each cell. Then for a given cell, you correlate its Weber fraction with every other cell, using the pairs for each individual. I would call this "across" not "between." Is this just a semantic thing or have I misunderstood the process?

      To make this concrete, consider the correlation for cell 10/11. I assume it is something like

      10 11

      Subj1 .25 .31

      Subj2 .13 .09

      Subj3 .22 .16

      Etc

      And correlation across participants will be the data point for the 10/11 cell in the matrix.

      It is a semantic error; this is exactly what we did: across participants.

      To clarify better: we measured the precision in transforming numbers into sequences of actions. This was done for different target numbers (8, 10, 11, 13, 14, 16, 19, 21, 24, 28, 32) and with N participants. For each target number, and independently for each participant, we than calculated the reproduction precision (Weber Fraction). The dataset then consists of a matrix where each column represents a participant, and each row a target number. Each cell contains the corresponding Weber Fraction. This dataset was then analysed with a simple correlation, across participants. For example, the WFs of the N participants obtained when testing the target number "8" were correlated with those obtained with the target numbers "10, 11, 13...32". The results show that the correlation between "8 and 10" (low numerical distance) was higher compared to that obtained correlating "8 with 32" (higher numerical distance). This pattern implies that the shared variance, between numbers (across participants) scales with numerical distance, in line with the existence of channels that aggregate similar numbers (tunings).

      (p. 7 lines 186-196)

      (3) The duration data should be analysed. While n is small, can't the authors correlate WFs across tasks? Suppose a similar pattern is observed, suggestive of >1 channel in this between-task correlation.

      One of the strengths of this technique is that it is very general, it can be applied to virtually every stimulus feature. We are currently collecting data to test the existence of generalised sensorimotor channels for continuous magnitudes: space, time, and numerosity. The logic is exactly as suggested. These correlational analyses however require (relatively) large samples and ad-hoc experimental conditions. We do not feel confident in providing messages on this with 9 participants. Out of curiosity, however, we analysed the data as requested and the results are interesting: signatures of sensorimotor channels emerge in both the number and duration tasks but NOT when analysed in conjunction (cross-task). If these results will be confirmed, would indicate the existence of separate mechanisms for the encoding of time and numerosity (and perhaps also space?).

      (4) The finding of similar results for fast and slow is quite interesting. And provides good motivation to do the duration control experiment. But two issues related to the control experiment:

      (4a) Why not look at the correlation matrix for the duration task? Was this not done because there were only 9 participants? If so, why the small n here?

      Yes, that is the reason. The aim of this work is not to investigate the existence of duration channels. This experimental condition was designed as a control for the use of non-numerical strategies in the number task. It worked well. The results were already obvious with 9 individuals (confirming Kirschhock & Nieder, Nature comm, 2022); we then did not consider necessary to continue in this direction. However, related to the previous point, we run a preliminary analysis on this small data set and (as mentioned above) signatures of sensorimotor channels (correlation gradients) emerge in both number and duration tasks but NOT when analysed in conjunction (cross-task), indicating different mechanism. We are now pursuing this issue using different number and duration tasks.

      (4b) I don't follow why greater precision on the tapping task compared to the duration task makes a strong case against the duration hypothesis. Is the argument that, if based on duration, there should be greater precision on the duration task since the tapping task would exhibit the variability from duration PLUS added noise from tapping? If this is the argument, this should be spelled out.

      Yes. The more precise feature dominates behaviour. In other words, in the number task if participants were reproducing the time required to give a certain number of presses, then in the timing task, where they are explicitly reproducing the same durations, they should show no lower precision. The results are opposite to that prediction. (p. 18 lines 418-420)

      (4c) Related to point 3 above, one would expect based on things like Rammsayer's study that duration judgments would also engage channels. Is the idea that these are different channels in the tapping task? There seems a good case to have participants do both tapping and duration tasks and then do the correlation matrices, comparing within and between tasks.

      Please see response to 3 and 4a.

      Recommendations for the authors:

      (1) On the logic of the channel concept as applied in the current context:

      While the authors present the numerical channel idea by analogy to how this concept is used for other features such as spatial frequency or orientation, there is no input to activate the channels-just a written numeral. The channel concept would mean that to respond to say, "16", you get output from multiple channels, with each weighted by its "tuning" to 16 such that the aggregate results in approximately 16 taps. This seems a bit odd: it would be like saying to draw, I use the output from my spatial frequency channels to create an image with a particular power spectrum. The logic of the channel concept in the current experimental context needs to be reviewed and clarified.

      The channel here reflects (probably) the activity of noisy neurons in charge to translate sensory information into a numerical motor output, such as those shown by Kirschhock & Nieder (Nature comm, 2022) in the crows. We used digits because their encoding (at least for such simple digits and educated adults) has no associated noise. The interindividual variability left, and analysed, is thus mainly associated with the motor transformation process, revealing sensorimotor channels.

      (2) A more thorough analysis of the duration task would strengthen the paper. The n is small for this interesting control condition and the analyses presented in the current version of the paper are limited. It is recommended to make this a fully powered test with complete analyses. Consider making this a new experiment in which participants do both the tapping and duration tasks to allow cross-modal analyses.

      We run some exploratory analyses on this, described in comments 3 and 4a. We prefer to leave this issue to dedicated future experiments (which are just started).

      (3) Expanded discussion of the limitations of the current study. The authors are clear that the methods don't provide a strong test of whether there are two or more than two channels. It would be useful to also comment on whether the estimated locations of the peaks are robust or if there is some sort of statistical bias for them to be at more extreme values. More generally, use the comments on the reviews to elaborate on various issues related to the channel concept.

      We addressed these issues in the ms (p. 17 lines 450-459).

      (4) Clarify the methods used to calculate the correlation matrix (see reviews).

      We now specified better the correlation analyses (p. 7 lines 186-196).

      (5) What is the basis for arguing that the mechanism under consideration is a "sensorimotor number system?" The data in this paper do not appear to provide evidence that the effects are linked to sensorimotor processes rather than reflect an amodal number system that is being accessed in their task through the motor system. At a minimum, present arguments for what motivates/justifies the sensorimotor claim or modify the paper to be neutral on this point.

      We now specified better the sensorimotor nature of the task used here (p. 14 lines 351-355; see also comment 1).

    2. eLife assessment

      This potentially important paper addresses the question of how numerical information is represented in the human brain. Experimental findings are interpreted as providing evidence for a sensorimotor mechanism that involves channels, each tuned to a particular numerical range. While this is an interesting application of methodologies used to identify the presence of channels, the evidence supporting the claim that these have a sensorimotor basis is incomplete.

    3. Reviewer #1 (Public Review):

      Anobile and colleagues present a manuscript detailing an account of numerosity processing with an appeal to a two-channel model. Specifically, the authors propose that the perception of numerosity relies on (at least) two distinct channels for small and large numerosities, which should be evident in subject reports of perceived numerosity. To do this, the authors had subjects reproduce visual dot arrays of numerosities ranging from 8 to 32 dots, by having subjects repetitively press a response key at a pre-instructed rate (fast or slow) until the number of presses equaled the number of perceived dots. The subjects performed the task remarkably well, yet with a general bias to overestimate the number of presented dots. Further, no difference was observed in the precision of responses across numerosities, providing evidence for a scalar system. No differences between fast and slow tapping were observed. For behavioral analysis, the authors examined correlations between the Weber fractions for all presented numerosities. Here, it was found that the precision at each numerosity was similar to that at neighboring numerosities, but less similar to more distant ones. The authors then went on to conduct PCA and clustering analyses on the weber fractions, finding that the first two components exhibited an interaction with the presented numerosity, such that each were dominant at distinct lower and upper ranges and further well-fit by a log-Gaussian model consistent with the channel explanation proposed at the beginning.

      Overall, the authors provide compelling evidence for a two-channel system supporting numerosity processing that is instantiated in sensorimotor processes. A strength of the presented work is the principled approach the authors took to identify mechanisms, as well as the controls put in place to ensure adequate data for analysis.

      One remaining question regards the secondary timing task that was used as a control. There may be interesting findings here to pursue, and so I encourage the authors or other researchers to examine those findings and explore further studies there as well.

    4. Reviewer #2 (Public Review):

      Summary:

      The authors wish to apply established psychophysical methods to the study of numbers. Specifically, they wish to test the hypothesis - supported by their previous work - that human sensorimotor processes are tuned to specific number ranges. In a novel set of tasks, they ask participants to tap a button N times (either fast or slow), where N varies between 8 and 32 across trials. As I understood it, they then computed the Weber fraction (WF) for each participant for each number and correlated those values across participants and numbers. They find stronger correlations for nearby numbers than for distant numbers and interpret this as evidence of sensorimotor tuning functions. Two other analyses - cluster analyses and principal component analyses (PCA) - suggest that participants' performance relied on at least 2 mechanisms, one for encoding low numbers of taps (around 10) and another encoding larger numbers (around 27).

      Strengths:

      Individual differences can be a rich source of scientific insight and I applaud the authors for taking them seriously.

      Weaknesses:

      Implications of intercorrelation. The experiment "is based on the idea that interindividual variability conveys information that can reveal common sensory processes (Peterzell & Kennedy, 2016)" but I struggle to understand the logic of this technique. The authors explain it most clearly when they write "Regions of high intercorrelation between neighbouring stimuli intensity can be interpreted to imply that sets of stimuli are processed by the same (shared) underlying channel. This channel, while responding relatively more to its preferred stimulus, will also be activated by neighbouring stimuli that although slightly different from the preferred intensity, are nevertheless included in the same response distribution." Why does high intercorrelation imply a shared channel and why should it be calculated across participants? Shouldn't performance on any set of tasks (that vary in difficulty) correlate across participants? Why in principle should people have distinct channels for processing similar stimuli and how could such a system improve (rather than impede) discrimination abilities? What pattern of intercorrelation would disconfirm the existence of tuning mechanisms? And perhaps most fundamentally: What is a channel and why do they matter?

      Different channels? I had trouble understanding much of the analyses, and this may account for at least some of my confusion. That said, as I understand it, the results are meant to provide "evidence that tuned mechanisms exist in the human brain, with at least two different tunings" because of the results of the clustering analysis and PCA. But as the authors acknowledge, "PCA aims to summarize the dataset with the minimal number of components (channels). We can therefore not exclude the possible existence of more than two (perhaps not fully independent) channels." I would go a step further and say this technique does not provide more evidence for the existence of 2 channels as for the existence of 4, 8 or 24 channels, the upper bound for a task testing 24 different numbers. If we can conclude that people may have one channel per number, what does "channel" mean?

      Several other questions arise when thinking through this technique, which left me skeptical of its utility. If people did have two channels (at least in this range), why would they be so broad? Why would they be centered so near the ends of the tested range? Can such effects be explained by binning on the part of the participants, who might have categorized each number (knowingly or not) as either "small" or "large"? Or by the kind of data-binning or distributions (i.e. Gaussian) used in the analyses? Or by the physical limits and affordances of the effector participants used (i.e. their finger)? Moreover, if people had sensorimotor channels tuned to different numbers, wouldn't this cause discontinuities in their own WF? Why look at correlations across individuals rather than correlations or discontinuities within individuals? Whereas the experiment tested numbers 8-32, numbers are infinite - How could a small number of channels cover an infinite set? Or even the set 8-10,000? What would the existence of multiple such channels mean for our understanding of numerical cognition? There may be good answers to these questions, but they are not clear to this reader.

      Theories of numerical cognition. An expansive literature on numerical cognition suggests that many animals, human children, and adults across cultures have two systems for representing numerosity without counting - one that can represent the exact cardinality of sets smaller than about 4 and another that represents the approximate number of larger sets. Recent accounts suggest that what appears to be two systems can be explained by a single system of numerical approximation with limited information capacity (see Cheyette & Piantadosi, 2020). The current paper would benefit from better relating its findings to this long lineage of theories and findings in numerical approximation across cultures, ages, and species.

      Specific to numbers? The authors argue that their effects are "number selective" but they do not provide compelling evidence for this selectivity. In principle, their main findings could be explained by the duration of tapping rather than the number of taps. They argue this is unlikely for two reasons. The first reason is that the overall pattern of results was unchanged across the fast and slow tapping conditions, but differences in duration were confounded with numerosity in both conditions, so the comparison is uninformative. The second reason is that temporal reproduction was less precise in their control condition than numerical reproduction, but this logic is unclear: Participants could still use duration (or some combination of speed and duration) as a helpful cue to numerosity, even if their duration reproductions were imperfect.

      If the authors wish to test the role of duration, they might consider applying the same analytical techniques they use for number to their duration data. Perhaps participants show similar evidence for duration-selective channels, in the absence of number, as they do for other non-numerical domains (like spatial frequency).

    5. Reviewer #3 (Public Review):

      Reviewing Editor's Summary:

      The revised manuscript has clarified concerns raised by the reviewers concerning the analysis method in constructing the correlation matrix. These key results are now readily comprehensible. They have also added a final section to the Discussion, sketching some important questions for future research (e.g., number/resolution of channels and extension of the logic used here to look at number channels in other tasks).

      Reviewer 1 was satisfied with these changes and has updated their review. Reviewer 2 did not think the revision tackled the theoretical issues raised in their initial review; as such, this reviewer has opted to leave their initial public review unchanged.

      I also believe that the revision does not adequately address a major theoretical issue, namely whether the current data provide evidence of sensorimotor number channels, the central claim of the paper. The authors argue that since perception is noise free (stimuli were given symbolically), then the task variance comes from processes associated with sensorimotor transformation. Let's consider the task: A number is presented, the participant then attempts to produce that number of taps. To preclude counting, they are required to say the syllable "ba" as fast as possible while tapping. The sensorimotor channel idea would suppose that the symbolic stimulus activates a set of channels, each of which specifies the number of taps that should be produced. For example, a "6" channel likes to produce 6 outputs (with variability), a "10" channel 10 outputs (with variability), etc., with the actual production of the (weighted) integration of these outputs.

      An alternative is that, since explicit counting is prevented by the secondary task, the participant makes an internal estimation of the number of produced taps. These judgments could be based on the output of amodal number channels. For example, the same process would be at play if the task were changed such that the participants watched a dot flash and had to estimate the number of flashes (while concurrently saying "ba"). The authors indicate in their response letter that they are conducting experiments along these lines and that the results are similar. They suggest that this provides support for the existence of both sensory and sensorimotor number channels. Extending this, if the experiment were tones instead of flashes, the argument would be that there are auditory, visual, and sensorimotor number channels. It seems more parsimonious to interpret such a pattern as reflective of amodal number channels.

      I recognize there are other intriguing reasons to think there may be intimate links between our sense of number and movement, but I remain unconvinced that the current results provide evidence for sensorimotor number channels.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to first thank the Editor as well as the two reviewers for their enthusiasm and careful evaluation of our manuscript. We also appreciate their thoughtful and constructive comments and suggestions. They did, however, have concerns regarding experimental design, data analysis, and over-interpretation of our findings. We endeavored to address these concerns through refinement of our framing, inclusion of additional new analyses, and rewriting some parts of our discussion section. We hope our response can better explain the rationale of our experimental design and data interpretation. In addition, we also acknowledge the limitations of our present study, so that it will benefit future investigations into this topic. Our detail responses are provided below.

      Reviewer #1 (Public Review)

      This study examines whether the human brain uses a hexagonal grid-like representation to navigate in a non-spatial space constructed by competence and trustworthiness. To test this, the authors asked human participants to learn the levels of competence and trustworthiness for six faces by associating them with specific lengths of bar graphs that indicate their levels in each trait. After learning, participants were asked to extrapolate the location from the partially observed morphing bar graphs. Using fMRI, the authors identified brain areas where activity is modulated by the angles of morphing trajectories in six-fold symmetry. The strength of this paper lies in the question it attempts to address. Specifically, the question of whether and how the human brain uses grid-like representations not only for spatial navigation but also for navigating abstract concepts, such as social space, and guiding everyday decision-making. This question is of emerging importance.

      Thanks very much again for the evaluation and comments. Please find our revision plans to each comment below.

      The weak points of this paper are that its findings are not sufficiently supporting their arguments, and there are several reasons for this:

      (1) Does the grid-like activity reflect 'navigation over the social space' or 'navigation in sensory feature space'? The grid-like representation in this study could simply reflect the transition between stimuli (the length of bar graphs). Participants in this study associated each face with a specific length of two bars, and the 'navigation' was only guided by the morphing of a bar graph image. Moreover, any social cognition was not required to perform the task where they estimate the gridlike activity. To make social decision-making that was conducted separately, we do not know if participants needed to navigate between faces in a social space. Instead, they can recall bar graphs associated with faces and compute the decision values by comparing the length of bars. Notably, in the trust game in this study, competence and trustworthiness are not equally important to make a decision (Equation 1). The expected value is more sensitive to one over the other. This also suggests that the space might not reflect social values but perceptual differences.

      The Reviewer raises an interesting point. We apologize for not being clear enough to address this possibility in our original manuscript and we will improve the clarity in our revision. To address this issue, we would like to break it into two sub-questions and answer them separately: 1) Are participants merely memorizing the values associated with each avatar or do they place the avatars on a two-dimensional map in their internal representation. 2) If so, are the two dimensions of this internal representation social dimensions relating to competence and trust or sensory dimensions relating to bar height (i.e., social space or sensory space).

      For the first question, we hope our analysis of the distance effect on the reaction time in the comparison task can address this issue. Specifically, it came from the idea that distance is a measure of similarity between two avatars in the 2D social space. The closer two avatars are, the more similar they are, hence distinguishing them will be harder and result in longer reaction time. If participants are merely memorizing the avatars as six isolated instances without integrating them into a low-dimensional map, then avatars should be equidistant (as if they were lying on the vertices of a 5-simplex), and would not show a distance effect. Therefore, we interpreted the stronger distance effect as a behavioural index of having a better internal map-like representation. This approach is adopted from the work by Park et al. (2020), where they used the distance effect to demonstrate human brains map abstract relationships among entities from piecemeal learning.

      For the second question of ‘social space’ vs. ‘sensory space’, our study adopted the paradigm developed by, in which they used a similar way to construct a conceptual space and found that such space can be represented with grid-like code in the entorhinal and prefrontal cortex. We stayed close to the original design by Constantinescu et al. (2016) and hoped that our work could provide, to some extent, a close replication of their result but using non-spatial social concepts instead. Indeed, this led to the limitation of our study that participants are passively traversing the artificial space rather than actively navigating in the space to make decisions/inferences. And we did not find sufficient evidence as reported in previous grid-like coding fMRI studies. This may have to do with low signal quality in the medial temporal region, we are not entirely sure. Nevertheless, we don’t think our findings contradict or disprove previous findings in any way. Here we would also like to point to the work by Park et al. (2021). Their task involves making novel inferences in a 2D social hierarchy space and found that grid-like code in the entorhinal cortex and medial prefrontal cortex support such novel inferences. Hence, we argue that results from these studies and partial evidence from our study collectively support the idea that the entorhinal is important for representing abstract knowledge (spatial and non-spatial).

      (2) Does the brain have a common representation of faces in a social space? In this study, participants don't need to have a map-like representation of six faces according to their levels of social traits. Instead, they can remember the values of each trait. The evidence of neural representations of the faces in a 2-dimensional social space is lacking. The authors argued that the relationship between the reaction times and the distances between faces provides evidence of the formation of internal representations. However, this can be found without the internal representation of the relationships between faces. If the authors seek internal representations of the faces in the brain, it would be important to show that this representation is not simply driven by perceptual differences between bar graphs that participants may recall in association with each face.

      Considering these caveats, it is hard for me to agree if the authors provide evidence to support their claims.

      With regard to the common representation of faces, this is a potential limitation of our paradigm because our current task design didn’t include a stage of face presentation to properly test this question. With regard to the asymmetry between the two dimensions in determining expected value. We think that the prerequisite for identifying six-fold grid-like coding is to have an abstract space formed by orthogonal dimensions, i.e., competence and trustworthiness in our task are not correlated. In addition, the scanner task does not require computation of expected value. However, we do think that it is worth investigating whether the extent to which each dimension contributes to decision-making and inference will distort the grid-like representation of the map. Our prediction is that the entorhinal cortex will maintain a representation of the map invariant to this aspect so that it can support inferences in different contexts where different weights may be assigned to different dimensions. But this will be an interesting hypothesis for future studies to test. We hope that our revision plans with above considerations could address the Reviewer’s comments.

      Reviewer #2 (Public Review)

      Summary:

      In this work, Liang et al. investigate whether an abstract social space is neurally represented by a grid-like code. They trained participants to 'navigate' around a two-dimensional space of social agents characterized by the traits of warmth and competence, then measured neural activity as participants imagined navigating through this space. The primary neural analysis consisted of three procedures: 1) identifying brain regions exhibiting the hexagonal modulation characteristic of a grid-like code, 2) estimating the orientation of each region's grid, and 3) testing whether the strength of the univariate neural signal increases when a participant is navigating in a direction aligned with the grid, compared to a direction that is misaligned with the grid.

      From these analyses, the authors find the clearest evidence of a grid-like code in the prefrontal cortex and weaker evidence in the entorhinal cortex.

      Strengths:

      The work demonstrates the existence of a grid-like neural code for a socially-relevant task, providing evidence that such coding schemes may be relevant for a variety of two-dimensional task spaces.

      Thank you very much again for your careful evaluation and thoughtful comments. Please find our response to the comments below.

      Weaknesses:

      In various parts of this manuscript, the authors appear to use a variety of terms to refer to the (ostensibly) same neural regions: prefrontal cortex, frontal pole, ventromedial prefrontal cortex (vmPFC), and orbitofrontal cortex (OFC). It would be useful for the authors to use more consistent terminology to avoid confusing readers.

      Thanks for pointing out the use of terms, we will try to improve that in the revision of our manuscript.

      Claims about a grid code in the entorhinal cortex are not well-supported by the analyses presented. The whole-brain analysis does not suggest that the entorhinal cortex exhibits hexagonal modulation; the strength of the entorhinal BOLD signal does not track the putative alignment of the grid code there; multivariate analyses do not reveal any evidence of a grid-like representational geometry.

      On a conceptual level, it is not entirely clear how this work advances our understanding of gridlike encoding of two-dimensional abstract spaces, or of social cognition. The study design borrows heavily from Constantinescu et al. 2016, which is itself not an inherent weakness, but the Constantinescu et al. study already suggests that grid codes are likely to underlie two-dimensional spaces, no matter how abstract or arbitrary. If there were a hypothesis that there is something unique about how grid codes operate in the social domain, that would help motivate the search for social grid codes specifically, but no such theory is provided. The authors do note that warmth and competence likely have ecological importance as social traits, but other past studies have used slightly different social dimensions without any apparent loss of generality (e.g., Park et al. 2021). There are some (seemingly) exploratory analyses examining how individual difference measures like social anxiety and avoidance might affect the brain and behavior in this study, but a strong theoretical basis for examining these particular measures is lacking.

      We acknowledge that we used very similar dimensions to the work by Park et al. (2021). While Park and colleagues (2021) took a more innovative and rigorous approach, we tried to stay close to the original design by Constantinescu et al. (2016) with the hope that our work could provide, to some extent, a close replication of their result. Our data was collected before the 2021 paper came out and as the comment points out, we did not find as complete and convincing evidence as in these previous grid-like coding fMRI papers. This may be due to low signal quality in the medial temporal region, we are not entirely sure. But we don’t think our current findings can contradict or disprove previous findings in any way.

      I found it difficult to understand the analyses examining whether behavior (i.e., reaction times) and individual difference measures (i.e., social anxiety and avoidance) can be predicted by the hexagonal modulation strength in some region X, conditional on region X having a similar estimated grid alignment with some other region Y. It is possible that I have misunderstood the authors' logic and/or methodology, but I do not feel comfortable commenting on the correctness or implications of this approach given the information provided in the current version of this manuscript.

      We apologize for not being clear enough in the manuscript and we will improve the clarity in our revision. This exploratory analysis aims to examine if there is any correlation between the strength of grid-like representation of social value map and behavioral indicators of map-like representation; and test if there are any correlation between the strength of grid-like representation of this social value map and participants’ social trait. For the behavioral indicator, we used the distance effect in the reaction time of the comparison task outside the scanner. The closer a pair of avatars are, the more similar they are, hence distinguishing them will be harder and results in longer reaction time when making comparison judgement. If participants are merely memorizing the avatars as six isolated instances without integrating them into a map, all avatars should be equidistant and there wouldn’t be a distance effect. We interpreted stronger grid-like activity as a neural index of better representation of the 2D social space, and we interpreted stronger distance effect as a behavioral index of having better internal map-like representation.

      It was puzzling to see passing references to multivariate analyses using representational similarity analysis (RSA) in the main text, given that RSA is only used in analyses presented in the supplementary material.

      We speculate if RSA in entorhinal ROI would be more sensitive than the wholebrain univariate analysis to identify grid-like code because a previous paper on grid-like code in olfactory space (Bao et al., 2019) didn’t identify grid-like representation with univariate analysis but identified it with RSA analysis. However, we failed to find evidence of grid-like code in the entorhinal ROI aligned to its own putative grid orientation with the RSA approach. We reported this result in the main text to show that we carried out a relatively thorough investigation to test the hypothesis using various approaches and decided to add references to the RSA approach in the main text as well.

      Reviewer #3 (Public Review)

      Liang and colleagues set out to test whether the human brain uses distance and grid-like codes in social knowledge using a design where participants had to navigate in a two-dimensional social space based on competence and warmth during an fMRI scan. They showed that participants were able to navigate the social space and found distance-based codes as well as grid-like codes in various brain regions, and the grid-like code correlated with behavior (reaction times).

      On the whole, the experiment is designed appropriately for testing for distant-based and grid-like codes and is relatively well-powered for this type of study, with a large amount of behavioral training per participant. They revealed that a number of brain regions correlated positively or negatively with distance in the social space, and found grid-like codes in the frontal polar cortex and posterior medial entorhinal cortex, the latter in line with prior findings on grid-like activity in the entorhinal cortex. The current paper seems quite similar conceptually and in design to previous work, most notably by Park et al., 2021, Nature Neuroscience.

      Thanks very much again for your careful evaluation and comments. Please find our response to the comments below.

      Below, I raise a few issues and questions on the evidence presented here for a grid-like code as the basis of navigating abstract social space or social knowledge.

      (1) The authors claim that this study provides evidence that humans use a spatial / grid code for abstract knowledge like social knowledge.

      This data does specifically not add anything new to this argument. As with almost all studies that test for a grid code in a similar "conceptual" space (not only the current study), the problem is that when the space is not a uniform, square/circular space, and 2-dimensional then there is no reason the code will be perfectly grid-like, i.e., show six-fold symmetry. In real-world scenarios of social space (as well as navigation, semantic concepts), it must be higher dimensional - or at least more than two-dimensional. It is unclear if this generalizes to larger spaces where not all part of the space is relevant. Modelling work from Tim Behrens' lab (e.g., Whittington et al., 2020) and Bradley Love's lab (e.g., Mok & Love, 2019) have shown/argued this to be the case. In experimental work, like in mazes from the Mosers' labs (e.g., Derdikman et al., 2009), or trapezoid environments from the O'Keefe lab (Krupic et al., 2015), there are distortions in mEC cells, and would not pass as grid cells in terms of the six-fold symmetry criterion.

      The authors briefly discuss the limitations of this at the very end but do not really say how this speaks to the goal of their study and the claim that social space or knowledge is organized as a grid code and if it is in fact used in the brain in their study and beyond. This issue deserves to be discussed in more depth, possibly referring to prior work that addressed this, and raising the issue for future work to address the problem - or if the authors think it is a problem at all.

      Thanks very much for the references to the papers that we haven’t considered enough in our discussion. We will endeavour to discuss the topic in more depth in our revision. In summary, we raise this discussion point because various research groups have found gridlike representations in 2D artificial conceptual space. We think that the next step for a stronger claim would be to find the representation of more spontaneous non-spatial maps.

      Data and analysis

      (2) Concerning the negative correlation of distance with activation in the fusiform gyrus and visual cortex: this is a slightly puzzling but potentially interesting finding. However, could this be related to reaction times? The larger the distance, the longer the reaction times, so the original finding might reflect larger activations with smaller distances.

      Thanks very much for the suggestion. However, we didn’t find a correlation between response time in the choice stage in the scanner task and the negative distance activation in the fusiform gyrus (Figures below). Meanwhile, the morph period in each trial remains the same, the negative correlation of distance with activation in the fusiform gyrus could also be interpreted as a positive correlation of morphing speed with activation in the fusiform gyrus. Indeed, stronger negative activation indicates larger activation for smaller distances, but we are uncertain what it indicates concerning the functional role of Fusiform in our current task.

      Author response image 1.

      (3) Concerning the correlation of grid-like activity with behavior: is the correlation with reaction time just about how long people took (rather than a task-related neural signal)? The authors have only reported correlations with reaction time. The issue here is that the duration of reaction times also relates to the starting positions of each trial and where participants will navigate to. Considering the speed-accuracy tradeoff, could performance accuracy be negatively correlated with these grid consistency metrics? Or it could be positively correlated, which would suggest the grid signal reflects a good representation of the task.

      We apologize for not being clear enough in the manuscript and we will improve the clarity in our revision. The reaction time used to calculate the distance effect is from a task outside the scanner. The closer a pair of avatars are, the more similar they are, hence distinguishing them will be harder and results in longer reaction time when making comparison judgement. If participants are merely memorizing the avatars as six isolated instances without integrating them into a map, all avatars should be equidistant and there wouldn’t be a distance effect. We interpreted stronger grid-like activity as a neural index of better representation of the 2D social space, and we interpreted stronger distance effect as a behavioural index of having better internal map-like representation. This was the motivation behind this analysis.

      References

      Bao, X., Gjorgieva, E., Shanahan, L. K., Howard, J. D., Kahnt, T., & Gottfried, J. A. (2019). Grid-like Neural Representations Support Olfactory Navigation of a Two-Dimensional Odor Space. Neuron, 102(5), 1066-1075 e1065. https://doi.org/10.1016/j.neuron.2019.03.034

      Constantinescu, A. O., O'Reilly, J. X., & Behrens, T. E. J. (2016). Organizing conceptual knowledge in humans with a gridlike code. Science,352(6292), 1464-1468. https://doi.org/10.1126/science.aaf0941

      Park, S. A., Miller, D. S., & Boorman, E. D. (2021). Inferences on a multidimensional social hierarchy use a grid-like code. Nat Neurosci, 24(9), 1292-1301. https://doi.org/10.1038/s41593-02100916-3

      Park, S. A., Miller, D. S., Nili, H., Ranganath, C., & Boorman, E. D. (2020). Map Making: Constructing, Combining, and Inferring on Abstract Cognitive Maps. Neuron, 107(6), 1226-1238 e1228. https://doi.org/10.1016/j.neuron.2020.06.030

    2. eLife assessment

      This study provides useful initial information on how humans represent two-dimensional abstract spaces in relation to the social traits of warmth and competence. While the study poses an interesting question, the evidence for a grid-like code at present is incomplete. This study will be of interest to researchers working in the field of spatial navigation as well as the navigation of conceptual abstract space.

    3. Reviewer #1 (Public Review):

      This study examines whether the human brain uses a hexagonal grid-like representation to navigate in a non-spatial space constructed by competence and trustworthiness. To test this, the authors asked human participants to learn the levels of competence and trustworthiness for six faces by associating them with specific lengths of bar graphs that indicate their levels in each trait. After learning, participants were asked to extrapolate the location from the partially observed morphing bar graphs. Using fMRI, the authors identified brain areas where activity is modulated by the angles of morphing trajectories in six-fold symmetry. The strength of this paper lies in the question it attempts to address. Specifically, the question of whether and how the human brain uses grid-like representations not only for spatial navigation but also for navigating abstract concepts, such as social space, and guiding everyday decision-making. This question is of emerging importance.

      I acknowledge the authors' efforts to address the comments received. However, my concerns persist:

      (1) The authors contend that shorter reaction times correlated with increased distances between individuals in social space imply that participants construct and utilize two-dimensional representations. This method is adapted from a previous study by Park et al. Yet, there is a fundamental distinction between the two studies. In the prior work, participants learned relationships between adjacent individuals, receiving feedback on their decisions, akin to learning spatial locations during navigation. This setup leads to two different predictions: If participants rely on memory to infer relationships, recalling more pairs would be necessary for distant individuals than for closer ones. Conversely, if participants can directly gauge distances using a cognitive map, they would estimate distances between far individuals as quickly as for closer ones. Consequently, as the authors suggest, reaction times ought to decrease with increasing decision value, which, in this context, corresponds to distances. However, the current study allowed participants to compare all possible pairs without restricting learning experiences, rendering the application of the same methodology for testing two-dimensional representations inappropriate. In this study, the results could be interpreted as participants not forming and utilizing two-dimensional representations.

      (2) The confounding of visual features with the value of social decision-making complicates the interpretation of this study's results. It remains unclear whether the observed grid-like effects are due to visual features or are genuinely indicative of value-based decision-making, as argued by the authors. Contrary to the authors' argument, this issue was not present in the previous study (Constantinescu et al.). In that study, participants associated specific stimuli with the identities of hidden items, but these stimuli were not linked to decision-making values (i.e., no image was considered superior to another). The current study's paradigm is more akin to that of Bao et al., which the authors mention in the context of RSA analysis. Indeed, Bao et al. controlled the length of the bars specifically to address the problem highlighted here. Regrettably, in the current paradigm, this conflation remains inseparable.

      (3) While the authors have responded to comments in the public review, my concerns noted in the Recommendation section remain unaddressed. As indicated in my recommendations, there are aspects of the authors' methodology and results that I find difficult to comprehend. Resolving these issues is imperative to facilitate an appropriate review in subsequent stages.

      Considering that the issues raised in the previous comments remain unresolved, I have retained my earlier comments below for review.

      The weak points of this paper are that its findings are not sufficiently supporting their arguments, and there are several reasons for this:

      (1) Does the grid-like activity reflect 'navigation over the social space' or 'navigation in sensory feature space'? The grid-like representation in this study could simply reflect the transition between stimuli (the length of bar graphs). Participants in this study associated each face with a specific length of two bars, and the 'navigation' was only guided by the morphing of a bar graph image. Moreover, any social cognition was not required to perform the task where they estimate the grid-like activity. To make social decision-making that was conducted separately, we do not know if participants needed to navigate between faces in a social space. Instead, they can recall bar graphs associated with faces and compute the decision values by comparing the length of bars. Notably, in the trust game in this study, the competence and trustworthiness are not equally important to make a decision (Equation 1). The expected value is more sensitive to one over the other. This also suggests that the space might not reflect social values but the perceptual differences.

      (2) Does the brain have a common representation of faces in a social space? In this study, participants don't need to have a map-like representation of six faces according to their levels of social traits. Instead, they can remember the values of each trait. The evidence of neural representations of the faces in a 2-dimensional social space is lacking. The authors argued the relationship between the reaction times and the distances between faces provides evidence of the formation of internal representations. However, this can be found without the internal representation of the relationships between faces. If the authors seek internal representations of the faces in the brain, it would be important to show that this representation is not simply driven by perceptual differences between bar graphs that participants may recall in association with each face.

      Considering these caveats, it is hard for me to agree if the authors provide evidence to support their claims.

    4. Reviewer #2 (Public Review):

      Summary:<br /> In this work, Liang et al. investigate whether an abstract social space is neurally represented by a grid-like code. They trained participants to 'navigate' around a two-dimensional space of social agents characterized by the traits warmth and competence, then measured neural activity as participants imagined navigating through this space. The primary neural analysis consisted of three procedures: 1) identifying brain regions exhibiting the hexagonal modulation characteristic of a grid-like code, 2) estimating the orientation of each region's grid, and 3) testing whether the strength of the univariate neural signal increases when a participant is navigating in a direction aligned with the grid, compared to a direction that is misaligned with the grid. From these analyses, the authors find the clearest evidence of a grid-like code in the prefrontal cortex and weaker evidence in the entorhinal cortex.

      Strengths:<br /> The work demonstrates the existence of a grid-like neural code for a socially-relevant task, providing evidence that such coding schemes may be relevant for a variety of two-dimensional task spaces.

      Weaknesses:<br /> In the revised manuscript, the authors soften their claims about finding a grid code in the entorhinal cortex and provide additional caveats about limitations in their findings. It seems that the authors and reviewers are in agreement about the following weaknesses, which were part of my original review: Claims about a grid code in the entorhinal cortex are not well-supported by the analyses presented. The whole-brain analysis does not suggest that the entorhinal cortex exhibits hexagonal modulation; the strength of the entorhinal BOLD signal does not track the putative alignment of the grid code there; multivariate analyses do not reveal any evidence of a grid-like representational geometry.

      In the authors' response to reviews, they provide additional clarification about their exploratory analyses examining whether behavior (i.e., reaction times) and individual difference measures (i.e., social anxiety and avoidance) can be predicted by the hexagonal modulation strength in some region X, conditional on region X having a similar estimated grid alignment with some other region Y. My guess is that readers would find it useful if some of this language were included in the main text, especially with regard to an explanation regarding the rationale for these exploratory studies.

    5. Reviewer #3 (Public Review):

      Liang and colleagues set out to test whether the human brain uses distance and grid-like codes in social knowledge using a design where participants had to navigate in a two-dimensional social space based on competence and warmth during an fMRI scan. They showed that participants were able to navigate the social space and found distance-based codes as well as grid-like codes in various brain regions, and the grid-like code correlated with behavior (reaction times).

      On the whole, the experiment is designed appropriately for testing for distant-based and grid-like codes, and is relatively well powered for this type of study, with a large amount of behavioral training per participant. They revealed that a number of brain regions correlated positively or negatively with distance in the social space, and found grid-like codes in the frontal polar cortex and posterior medial entorhinal cortex, the latter in line with prior findings on grid-like activity in entorhinal cortex. The current paper seems quite similar conceptually and in design to previous work, most notably Park et al., 2021, Nature Neuroscience.

      (1) The authors claim that this study provides evidence that humans use a spatial / grid code for abstract knowledge like social knowledge.

      This data does specifically not add anything new to this argument. As with almost all studies that test for a grid code in a similar "conceptual" space (not only the current study), the problem is that, when the space is not a uniform, square/circular space, and 2-dimensional then there is no reason the code will be perfectly grid like, i.e., show six-fold symmetry. In real world scenarios of social space (as well as navigation, semantic concepts), it must be higher dimensional - or at least more than two dimensional. It is unclear if this generalizes to larger spaces where not all part of the space is relevant. Modelling work from Tim Behrens' lab (e.g., Whittington et al., 2020) and Bradley Love's lab (e.g., Mok & Love, 2019) have shown/argued this to be the case. In experimental work, like in mazes from the Mosers' labs (e.g., Derdikman et al., 2009), or trapezoid environments from the O'Keefe lab (Krupic et al., 2015), there are distortions in mEC cells, and would not pass as grid cells in terms of the six-fold symmetry criterion.

      The authors briefly discuss the limitations of this at the very end but do not really say how this speaks to the goal of their study and the claim that social space or knowledge is organized as a grid code and if it is in fact used in the brain in their study and beyond. This issue deserves to be discussed in more depth, possibly referring to prior work that addressed this, and raise the issue for future work to address the problem - or if the authors think it is a problem at all.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We wish to thank the reviewers for their helpful insightful comments. Their concerns were mainly related to the interpretation of the data, help in clarifying our statements and improving our discussion.

      Reviewer #1 (Recommendations For The Authors):

      This is a very interesting study It involves the utilization of hippocampal neuronal cultures from syntaxin 1 knock-out mice. These cultures serve as a platform for monitoring changes in synaptic transmission through electrophysiological recording of postsynaptic currents, upon lentiviral infection with various isoforms, chimeras, and point mutations of syntaxins.

      The authors observe the following:

      (1) Syntaxin2 restores neuronal viability and can partially rescue Ca2+-evoked release in syntaxin1 knock-out neurons that it is much slower (cumulative charge transfer differences) and with a clearly smaller RRP than when rescued with syntaxin1. In contrast, syntaxin2-mediated rescue leads to a high increase in spontaneous release (Figure 1). Convincingly, the authors conclude that syntaxin 1 is optimized for fast phasic release and for clamping of spontaneous release, in comparison with syntaxin2.

      (2) The replacement of the SNARE domain (or its C-terminal part) of syntaxin1 by the SNARE domain of syntaxin2 (or its C-terminal part) rescues the fast kinetics, but not the amplitude, of Ca2+-evoked release. This is associated with a decrease in the size of the RRP and an increase in spontaneous release. The probability of vesicular release (PVR) is a little bit increased, which is intriguing because a little decrease would be expected instead according to the reduced RRP, indicating that an enhancement of Ca2-dependent fusion is occurring at the same time by unknown mechanisms as the authors properly point out. The replacement of the Analogous experiments in which the SNARE domain of syntaxin1 is replaced into syntaxin2, reveals the exitance of differential regulatory elements outside the SNARE domain.

      (3) Different constructs of syntaxin 1 and syntaxin 2 display different expression levels. On the other hand, the expression levels of Munc-18 are associated with the characteristics of the transfected specific syntaxin construct. In any case, the electrophysiological phenotypes cannot be consistently explained by changes in Munc-18.

      (4) Mutations in several residues of the outer surface of the C-terminal half of the syntaxin1 SNARE domain lead to alterations in the RRP and the frequency of spontaneous release, but the changes cannot attributed to a change in the net surface charge, because the alterations occur even in paired mutations in which electrical neutrality is conserved.

      Comments:

      (1) This is a comment regarding the interpretation of the results. In general, the decrease in the RRP size is associated with the increased frequency of spontaneous release due to unclamping. The authors claim that both phenomena seem to be independent of each other. In any case, how can the authors discard the possibility that the unclamping of spontaneous release leads to a decrease in the RRP size?

      The main argument against the reduction of the RRP being caused by the observed increase in the mEPSC frequency is based on kinetics of refilling and depletion. The average time a vesicle fuses spontaneously after it becomes primed is 500 – 1000 seconds (spontaneous vesicle release rate – STX1 Figure 1, Figure 2 and Figure 3). The time it takes to refill the RRP after depletion is in the order of 3 seconds (Rosenmund and Stevens, 1996). Therefore, the refilling of the RRP is more than 100 times faster. Even when the spontaneous release would increase 5 fold, this would lead to less than 5 % of the steady state depletion of the RRP.

      (2) The authors have analyzed the kinetics of mEPSCs and found differences (Fig2-Supp. Fig1; Fig2-Supp. Fig1). It would be interesting and pertinent to discuss these data in the context of potential phenotypes in the fusion pore kinetics involving syntaxin1 and syntaxin2 and their SNARE domains. Indeed, the figure will improve by including averaged traces of mEPSCs.

      We thank the reviewer for the idea. Upon closer examination of the changes in mEPSC rise time and mEPSC decay time we noticed a minor slowing in the mEPSC rise time from 0.443ms (SEM0.0067) of STX1A to 0.535ms (SEM0.0151) for STX1A-2(SNARE) or 0.507ms (SEM0.01251) for STX1A-2(Cter), while the mEPSC half widths did not change significantly. It is possible that the measured change is related to the detection algorithm as mEPSC detection at elevated frequencies becomes more difficult due to increased overlap of event, and we therefore prefer to refrain from making any mechanistic claims.

      Minor comments:

      (1) Fig2 J; Fig 3 J. It is difficult to distinguish between different colors and implementing a legend within the graph will be very helpful.

      (2) Fig3 H. Please change the color of the box plot for Stx1 A to improve the contrast with the individual data points.

      (3) Page 6. Line 225. "Figure 2D and E" should be corrected to "Figure 2C and D"

      (1) Colors were changed for clearer visualization. (2) Unfortunately, changing the color did not improve the contrast with the individual plots. However, the numerical data is all included in the data sheets of the corresponding figure. (3) The mistake was corrected.

      Reviewer #2 (Recommendations For The Authors):

      Line 135-136: Are cited numbers cited in the text mean and SEM? Please indicate.

      Line 139 and Figure 1G: The difference between purple and blue was very hard to see on my hard copy.

      Line 152: Reference to Figure 1L should probably be 1K.

      Line 183: Reference to Figure 2C should probably be Figure 2F.

      Line 225: Reference to Figure 2D and 2E should probably be 2C and 2D.

      Line 239: Reference to Figure 3I should probably be 3H.

      All typos were addressed and colors were changed for better visualization.

      Line 210-211: Sentence ("One of the benefits..") is hard to understand.

      Thank you for noticing this mistake, agreeably the the sentence did not add any important or new information and so it was deleted. Additionally, the message of the mentioned sentence was already clearly stated in lines 209-211.

      Figure 4E-H misses data for STX2, for the figure to be arranged like Figure 5.

      Given that STX1 is the endogenous syntaxin in hippocampal neurons, we use it at a control for all the analysis done in STX2 and STX2-chimera experimental groups, thus it is included in Figure 3 and 5.

      It appears that the authors do not present or discuss the Western Blot in Fig. 4D. Are the quantitative results of the Western Blot consistent with or different from the quantification of the immunostainings (Fig. 4B-C)? A similar question for Figure 5D, which also seems not to be presented.

      In terms of quantification, we have relied mainly on the ICC experiments because they test also for putative impairments in transport to the presynaptic compartment. Our WB data are overall consistent with the results, but were not used to quantitate expression of our syntaxin chimeras and mutations in the STX1-null hippocampal neuron model.

      Figure 6F-G: The normalization of spontaneous vesicular release rates is not clear, because the vesicular release rates already contain a normalization (mEPSC rate divided by RRP size). Is a further normalization of the STX1A condition informative? The authors should consider presenting the release rates themselves. In any case, the normalization should be presented/explained, at least in the legends.

      The reviewer is in principle correct. Due to the large number of experimental groups we had to perform recordings from multiple cultures, where not all experimental groups were present, while the WT STX1 was present as a consistent control. The reduce culture to culture variability, additional normalization to the WT control group was performed. However, we also included the raw data numerical values in the data-source sheets (Normalized and absolute), which produce a similar overall outcome.

      References to Figure 7 subpanels (A, B, and C) are missing.

      Thank you for the comment. We have integrated all panels into one for better representation and understanding since they are representative of one another.

      Lines 330-339 and Figure 7 in Discussion: the authors discuss that adding the non-cognate STX2 SNARE-domain to syntaxin-1 might destabilize the primed state and decrease the fusion energy barrier (as indicated in Figure 7C). What is the evidence that the decrease in RRP size is not caused solely by the depletion of the pool due to the increased spontaneous fusion?

      Please see the comments to major point 2 of reviewer 1.

      Statistics: Missing is the number of observations (n) for all data. Even if all data points are displayed, this should be stated.

      N numbers are included in the data sheets attached to each figure.

      The statement (start of Discussion,) that the SNARE-domain of STX1 'plays a minimal role in the regulation for Ca2+-evoked release' is somewhat puzzling, since without the SNARE-domain in STX1 there would be no Ca2+-evoked release. I guess these statements (similar statements are found elsewhere) are due to the interesting finding that STX2 leads to a decrease in release kinetics, compared to STX1, and this is not (entirely) due to differences in the SNARE-domain. I would suggest rephrasing the finding in terms of release kinetics. Also, the statement in the last sentence of the Abstract is not clear.

      Thank you for pointing this out and we agree that our experiments showed strong impact of the syntaxin isoform exchange on release kinetics and overall release output. A similar comment came also from reviewer #3 and so, we have addressed both comments as one.

      Our confusing statement resulted from the order of the presented results and our summarizing remarks for each section. Our statement reflected our finding that mutating residues in the C-terminal part of the STX1 SNARE motif affected only spontaneous release and RRP size but not release efficacy. We now state (pg. 6 lines 231-233) that the data observed from the comparison of “the results obtained from the Ca2+-evoked release between STX1 and STX2 support major regulatory differences of the domains outside of the SNARE domain between isoforms”.

      We have changed the abstract pg. 2 lines 55-56

      We have changed the introduction pg. 3 lines 102-105 for a better contextualization.

      We have changed the start of the discussion pg. 9 lines 250-252 for better contextualization.

      Reviewer #3 (Recommendations For The Authors):

      In this manuscript, Salazar-Lázaro et al. presented interesting data that C-terminal half of the Syx1 SNARE domain is responsible for clamping of spontaneous release, stabilizing RRP, and also Ca2+-evoked release. The authors routinely utilized the chimeric approach to replace the SNARE domain of Syx1 with its paralogue Syx2 and analyzed the neuronal activity through electrophysiology. The data are straightforward and fruitful. The conclusions are partly reasonable. One obvious drawback is that they did not explore the underlying mechanism. I think it is easy for the authors to carry out some simple assays to verify their hypothesis for the mechanism, instead of just talking about it in the discussion section. In all, I appreciate the data presented in the manuscript. If the authors could supply more data on the mechanisms, this would be important research in the field. Some critical comments are listed below:

      We thank the reviewer for his/her comments and suggestions.

      Major comments:

      (1) In pg.3, lines 102-104, the authors stated that 'We found that the C-terminal half of the SNARE domain of STX1.. ..while it is minimally involved in the regulation of Ca2+-evoked release.' But in pg.5, lines 174-176, they wrote that 'Replacement of the full-SNARE domain (STX1A-2(SNARE)) or the C-terminal half (STX1A-2(Cter)) of the SNARE domain of STX1A with the same domain from STX2 resulted in a reduction in the EPSC amplitude (Figure 2B).' and in pg.5-6, lines 197-199, they wrote that 'Taken together our results suggest that the C-terminal half of the SNARE domain of STX1A is involved in the regulation of the efficacy of Ca2+-evoked release, the formation of the RRP and in the clamping of spontaneous release.' It puzzles me a lot as to what the authors are really trying to express for the relationship between C-half of the SNARE complex and Ca2+-evoked release (i.e., minimally involved or significantly participate in the process?). Please clarify and reorganize the contexts.

      Please see our reply to the last comment of reviewer 2.

      (2) Figure 1-figure supplement 1, the authors should analyze Syx1/VGlut1 level additionally. And, if possible, compare the difference between Syx1/VGlut1 and Syx2/VGlut1.

      The levels of STX1/VGlut1 and STX2/VGlut1 were analyzed in detail in Figures 4 and 5.

      The direct comparison between the expression levels of these two proteins is not possible since affinities of the antibodies to the target proteins are different and can induce potential biases. While this could be overcome by the use of a FLAG-tag to the syntaxin proteins, we have not utilized this approach in this publication. We in addition inferred sufficient and comparable expression of both syntaxins from their ability to rescue some of syntaxin1 loss of function phenotypes.

      (3) Figure 2D only analyzed the EPSC half-width, could the author alternatively analyze the rise/decay time? Also, in Figure 3-figure supplement 1, does it refer to the kinetic parameters of Syx2-1A in Figure 3? It is very confused.

      We have changed the text accordingly and each parameter is referenced to its corresponding figure for clarity. As for the decay and rise time of STX1 and STX1-chimeras, they are in Figure 2-figure supplement 1A and B.

      (4) On pg.4, lines 151-152, 'Finally, no change was observed in the paired-pulse ratio (PPR) between STX1A and STX2 groups (Figure 1L).' does not contain any explanations and comments for this observation in the texts.

      The small EPSC amplitudes and altered kinetics on the STX2 constricts (Figure 1 and Figure 3) have made it more difficult to quantitate paired pulse experiments. Therefore, we preferred not to overinterpret these measurements. The findings that the paired pulse data were not significantly different, fit with the vesicular release probability measurements which showed no major changes. We have made our statement on this basis.

      (5) On pg.6, lines 235-236, the authors wrote that 'Additionally, we found that only STX2-1A(SNARE) and STX2-1A(Cter) could rescue the RRP to around double of what we measured from STX2 and STX2-1A(Nter) (figure 3F)'. However, in Figure 3F, the authors indicated 'n.s.' (p>0.05) for the differences between STX2 and STX2-1A(SNARE)/STX2-1A(Cter). It is perplexing how the authors interpret their data. Definitely, the p-value could not be arbitrarily used as a criterion of difference. An easier way is that indicating the exact p-values for each comparison (indicate in figure legends or list in tables).

      We apologize for any confusion, and hope the modification gives more clarity in our interpretation. The calculated p-values are included in attached data source tables and hope this will provide clarity to our comparative analysis. We have changed the text in pg 7 lines 238-241 and are cautious to overinterpret these results and rely more on the data observed in STX1A-chimeras, which show significant changes in the RRP.

      (6) I noticed that the authors preferred using 'xx% increase/decrease' or 'xx-fold increase/decrease' to interpret their inter-group data. I would doubt whether the interpretations are appropriate. First, it seems that most of the individual scatters from one set were not subject to Gaussian distribution; also, the authors utilized non-parameter tests to compare the differences. Second, the authors did not explicitly indicate the method to calculate the % or fold, e.g., by comparing mean value or median. I think it is a bad choice to use the median to calculate fold changes; meanwhile, the mean value would also be biased, given the fact that the data were not Gaussian-distributed. The authors should be cautious in interpreting their data.

      We thank the reviewer for pointing the inaccuracy of our descriptions and have included the parameter used to calculated the percentage and fold increase/decrease in the materials and methods section. Specifically, the mean. Our intention is to plainly state the amount of change seen in a parameter based on the observed changes in the mean value. We agree with the reviewer that interpreting this could be problematic if we are speculating possible mechanisms. Further test should be conducted as to state whether similar increase/decrease changes in a parameter are due to the disturbance of the same mechanisms or different. E.g., we discussed whether the regulation of SYT1 might be or not be the mechanism affected in some of the chimeras that show an increase in the spontaneous release rate, for the release rate observed in some is massively higher than that seen in SYT1-KO (Bouazza-Arostegui et al., 2022). It is tempting to speculate that it could be due to other mechanisms based on the differences in the changes. For this reason, we have given an array of possible mechanisms affected when we manipulate the SNARE domain of STX1.

      (7) The authors routinely analyzed the levels of Munc18-1 in neuronal lysates by WB and Munc18-1/VGlut1 by immunofluorescence in various Syx1 mutants. However, in my view, these assays were slightly indirect. It is evident that the SNARE domain of Syx1 participates in the binding to Munc18-1 according to the atomic structures (pdb entries: 3C98 and 7UDB). Meanwhile, Han et al. reported that K46E mutation (located in domain 1 of Munc18-1) strongly impairs Syx1 expression, Syx1-interaction, vesicle docking and secretion (Han et al., 2011, PMID: 21900502). Intriguingly, the residue K46 of Munc18-1, which is close to D231/R232 of Syx1, may have potential electrostatic contacts to D231 and R232 of Syx1. This is reminiscent of the possibility that Syx1D231/R232 and some Syx1-2 chimeras lost their normal function through their defective binding to Munc18-1.nmb, To better understand the underlying mechanism, the authors may need to carry out in vivo and/or in vitro binding analysis between syntaxin mutants/chimeras and Munc18-1. They also need to conduct more discussions about the issue.

      We express our gratitude for the identification of a previously overlooked aspect in our investigation of the interplay between Munc18-1 and STX1. In response, we have incorporated additional discourse on this matter in pg11 lines 419-431.

      Additionally, we appreciate the thoughtful suggestion regarding additional experiments to further explore the molecular relationship between Munc18-1 and STX1. We agree that co-immunoprecipitation experiments (either by using an antibody against Munc18-1 or STX1 and STX2) would offer greater insight into whether the binding of these proteins is affected in the isoform or the mutants. Notably, we performed immunoprecipitation experiments by using neuronal lysates of the corresponding groups and using STX1A and STX2 antibodies for the pull-downs. However, we were unable to co-IP Munc18-1 when doing so. Changing the conditions of the experiment did not yield better results and so these experiments remained inconclusive for the moment. For this reason, we included it as an open question and a potential concluding hypothesis of the molecular mechanism. However, Shi et al., 2021, have performed co-IP assays using Munc18-1-wt and a mutant form which affects the binding to the C-terminal half of the SNARE domain of STX, and STX1-wt and a STX mutants targeting some of our residues of interest and showed a decrease in the pulled-down levels of Munc18-1 using HeLa cells. We have made sure to mention the conclusion of this important publication in our discussion.

      (8) The third possible mechanism (i.e., interaction with Syt1) proposed by the authors seems more reasonable. However, the discussions raised by the authors were not enough. For instance, plenty of literature has indicated that Syt1 may participate in synaptic vesicle priming through stabilizing partially or fully assembled SNARE complex (Li et al., 2017, PMID: 28860966; Bacaj et al., 2015, PMID: 26437117; Mohrmann et al., 2013, PMID: 24005294; Wang et al., 2011; PMID: 22184197; Liu et al., 2009, PMID: 19515907); complexins are also SNARE binding modules that regulate synaptic exocytosis. Lack of complexins could lead to unclasping of spontaneous fusion of synaptic vesicles, though it causes severe Ca2+-triggered release at the same time (Maximov et al., 2009, PMID: 19164751). Meanwhile, different domains of complexin may accomplish different steps of SV fusion, early research had indicated that the C-terminal sequence of complexin is selectively required for clamping of spontaneous fusion and priming but not for Ca2+-triggered release (Kaeser-Woo et al., 2012, PMID: 22357870). Likewise, if possible, the authors may need to carry out in vivo and/or in vitro binding analysis to confirm their hypothesis.

      The exploration of complexin´s involvement was limited in our study primarily due to our methodological focus on comprehending molecular mechanisms concerning the sequence disparities between STX1 and STX2. Our laboratory has studied the role of Complexin extensively, and we certainly have had a possible involvement in mind. However, since the sites identified on syntaxin are either conserved between STX1 and STX2 or not close to the central or accessory helical domains of complexin, we did not perform experiments to test putative interactions, and we refrained from discussing complexin in this paper.

      (9) Lastly, I would suspect that whether the defects of Syx2 and Syx1 chimeras were caused by the SNARE complex itself, from another point of view that is different from the hypothesis raised by the authors. Changing the outward residues (or we say the solvent-accessible residues) of the SNARE complex may affect the stability, assembly kinetics, and energetics (Wang and Ma, 2022, PMID: 35810329; Zorman et al., 2014, PMID: 25180101), especially for the C-terminal halves. Is this another possible mechanism through which the C-terminus of Syx1 might contribute to SV priming and clamping of spontaneous release? The authors should at least conduct some discussions about the point.

      Thank you for this suggestion. We indeed assumed that since the hydrophobic layers of the SNARE domains that form the hydrophobic pocket of STX2 and STX1 are mainly conserved, that the intrinsic stability of the SNARE complex is largely unchanged. Additionally, Li et al., (2022) PMID: 35810329 examined the stability of the alfa-helix structure of the SNARE domain of SNAP25. And while they found no changes in the stability and formation of the alfa-helix when mutating outwards-facing residues for methodological purposes (bimane-tryptophan quenching), their study did not selectively explore the effect of mutations of outer-surface residues on the stability of the alfa-helix.

      Zorman et al., (2014) PMID: 25180101, as noted by the reviewer, observed that changes in the sequence of the SNARE domain (by using SNARE proteins from different trafficking systems (neuron, GLUT4, yeast…) correlated with changes in the step-wise SNARE complex assembly. However, they also did not selectively mutate the outer solvent-accessible residues, hindering conclusive speculations in the contribution of said residues on the kinetics and energetics of assembly and intrinsic stability of the SNARE complex.

      Upon petition of the reviewer, we have added this paragraph to discuss an additional mechanism:

      “As a final remark, it is possible that the changes in the spontaneous release rate and the priming stability may stem from a reduced stability of the SNARE complex itself through putative interactions between outer surface residues. Studies of the kinetics of assembly of the SNARE complex which mutate solvent-accessible residues in the C-terminal half of the SNARE domain of SYB2 have shown reduction in the stability of the SNARE complex assembly and are correlated with impaired fusion (Jiao et al., 2018). However, STX1 mutations of outward residues were inconclusive and were always accompanied by hydrophobic layer mutations (Jiao et al., 2018), which affect the assembly kinetics and energetics of the SNARE complex (Ma et al., 2015). Single molecule optical-tweezer studies have focused on the impact of regulatory molecules on the stability of assembly such as Munc18-1 (Ma et al., 2015; Jiao et al., 2018) and complexin (Hao et al., 2023), or on the intrinsic stability of the hydrophobic layers in the step-wise assembly of the SNARE complex (Gao et al., 2012; Ma et al., 2015; Zhang et al., 2017). Although the conserved hydrophobic layers in the SNARE domains of STX1A and STX2 (Figure 1) suggest unchanged zippering and intrinsic stability of the complex, further studies addressing the contribution of surface residues on the stability of the alfa-helix structure of the SNARE domain of STX1 (Li et al., 2022) or the stability of the SNARE complex should be conducted.”

      Minor comments:

      (1) In pg.6, line 236, 'figure 3F', the initial 'f' should be uppercased.

      (3) On pg.11, line 396, the section title 'The interaction of the C-terminus of de SNARE domain of STX1A with Munc18-1 in the stabilization of the primed pool of vesicles.' The word 'de' is confusing, please check.

      (4) In pg.12, line 446, the section title, should 'though' be 'through'?

      These comments have been acknowledged and changed. Thank you

      (2) In pg.7, line 239, '..had an increased PVR (Figure 3G), no change in the release rate (Figure 3I)', should Figure 3I be Figure 3H? and line 240, 'and an increase in short-term depression during 10Hz train stimulation (Figure 3I)', should Figure 3I be Figure 3J? If so, Figure 3I will not be cited in the texts and lack adequate interpretations. Please check.

      We apologize for the oversight in not referencing this specific subpanel of the figure and have incorporated the reference in the text. Additionally, our interpretation of this data is connected to the mechanisms that govern efficacy of Ca2+-evoked response, and its dependence on the integrity of the entire-SNARE domain. We wish to highlight the modifications made to the discussion on the regulation of the Ca2+-evoked response based on previous reviewer comment #1, and a similar comment from reviewer #2 (as stated previously).

    2. eLife assessment

      This important study presents a series of results to uncover the role of C-terminal half of the Syx1 SNARE domain. The evidence supporting the conclusions is convincing. The paper will be of broad interest to biophysicists and neurobiologists.

    3. Reviewer #1 (Public Review):

      In this systematic and elegant structure-function analysis study, the authors delve into the intricate involvement of syntaxin 1 in various pivotal stages of synaptic vesicle priming and fusion. The authors use an original and fruitful approach based on the side-by-side comparison of the specific contributions of the two isoforms syntaxin 1 and syntaxin 2, and their respective SNARE domains, in priming, spontaneous and Ca2+-dependent glutamate release. The experimental approach, mastered by the authors, offers an ideal means of unraveling the molecular roles played by syntaxins. Although it is not easy to come up with a model explaining all the observed phenotypes, the authors carefully restrict their conclusions to the role of the C-terminal half of the syntaxin1 C-terminal SNARE domain in the maintenance of the RRP and the clamping of neurotransmitter release. The study is carefully carried out, the conclusions are supported by high quality data and the manuscript is clearly written. In addition, the study clearly set new questions than open new paths for future experimental work.

    4. Reviewer #2 (Public Review):

      Summary:<br /> The manuscript by Salazar-Lázaro et al. systematically dissects out the different functional properties of the SNARE-domains of syntaxin-1 and syntaxin-2. By systematically substituting the SNARE-domain (or its C- or N-terminal half) into the non-cognate counterpart, the authors find that the C-terminal half of the SNARE-complex is especially important for maintaining RRP size and clamping spontaneous release. They also mutate single residues, to further nail down the effect. Overall, this is an interesting manuscript, which sheds light on the functionality of different co-expressed SNARES.

      Strengths:<br /> The strength of the manuscript is the systematic dissection, using substitution of either SNARE-domain into the other syntaxin, together with the state-of-the art methods. The authors follow up with a substitution of single and paired residues. This is a large undertaking, which has been very well carried out.

      Weaknesses:<br /> No major weaknesses. The large number of experiments paint a somewhat complicated picture because the process under study is complicated.

    5. Reviewer #3 (Public Review):

      Summary:<br /> In this manuscript, Salazar-Lázaro et al. presented interesting data that C-terminal half of the Syx1 SNARE domain is responsible for clamping of spontaneous release, stabilizing RRP, and also Ca2+-evoked release. The authors routinely utilized the chimeric approach to replace the SNARE domain of Syx1 with its paralogue Syx2 and analyzed the neuronal activity through electrophysiology. The data are straightforward and fruitful. The conclusions are reasonable.

      Strengths:<br /> The electrophysiology data that illustrate the important functions of Syx1 in clamping of spontaneous release, stabilizing RRP, and also Ca2+-evoked release were clear and convincing.

      Weaknesses:<br /> One weakness is that the authors did not go deep into the underlying molecular mechanisms experimentally, either because of a variety of complicated possibilities or limited space of the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public reviews

      Reviewer 1 (Public Review):

      Summary:

      The authors set out to clarify the molecular mechanism of endocytosis (re-uptake) of synaptic vesicle (SV) membrane in the presynaptic terminal following release. They have examined the role of presynaptic actin, and of the actin regulatory proteins diaphanous-related formins (mDia1/3), and Rho and Rac GTPases in controlling the endocytosis. They successfully show that presynaptic membrane-associated actin is required for normal SV endocytosis in the presynaptic terminal and that the rate of endocytosis is increased by activation of mDia1/3. They show that RhoA activity and Rac1 activity act in a partially redundant and synergistic fashion together with mDia1/3 to regulate the rate of SV endocytosis. The work adds substantially to our understanding of the molecular mechanisms of SV endocytosis in the presynaptic terminal.

      Strengths:

      The authors use state-of-the-art optical recording of presynaptic endocytosis in primary hippocampal neurons, combined with well-executed genetic and pharmacological perturbations to document effects of alteration of actin polymerization on the rate of SV endocytosis. They show that removal of the short amino-terminal portion of mDia1 that associates with the membrane interrupts the association of mDia1 with membrane actin in the presynaptic terminal. They then use a wide variety of controlled perturbations, including genetic modification of the amount of mDia1/3 by knock-down and knockout, combined with inhibition of activity of RhoA and Rac1 by pharmacological agents, to document the quantitative importance of each agent and their synergistic relationship in regulation of endocytosis.<br /> The analysis is augmented by ultrastructural analyses that demonstrate the quantitative changes in numbers of synaptic vesicles and in uncoated membrane invaginations that are predicted by the optical recordings.

      The manuscript is well-written and the data are clearly explained. Statistical analysis of the data is strengthened by the very large number of data points analyzed for each experiment.

      Weaknesses:

      There are no major weaknesses. The optical images as first presented are small and it is recommended that the authors provide larger, higher-resolution images.

      Response: We thank the referee for these highly positive remarks. In response, we now provide larger, high-resolution images as requested.

      Reviewer 2 (Public Review):

      Summary:

      This manuscript expands on previous work from the Haucke group which demonstrated the role of formins in synaptic vesicle endocytosis. The techniques used to address the research question are state-of-the-art. As stated above there is a significant advance in knowledge, with particular respect to Rho/Rac signalling.

      Strengths:

      The major strength of the work was to reveal new information regarding the control of both presynaptic actin dynamics and synaptic vesicle endocytosis via Rho/Rac cascades. In addition, there was further mechanistic insight regarding the specific function of mDia1/3. The methods used were state-of-theart.

      Weaknesses:

      There are a number of instances where the conclusions drawn are not supported by the submitted data, or further work is required to confirm these conclusions.

      Response: We thank the referee for his/ her thorough reading of the manuscript and the thoughtful comments and questions. We have conducted additional experiments and made textual change to our manuscript to address these points and to further strengthen the conclusions as detailed in our response to the recommendations for authors.

      Recommendations for the authors

      Reviewer 1 (Recommendations For The Authors):

      Most of the figures contain images that are too small to be easily interpreted because the resolution is degraded when they are enlarged in the PDF file. The authors should redesign the figures so that the letters marking each panel are smaller, and the size of each data panel is much larger (at least twice as large with increased resolution). There is, at present, a great deal of white space in most of the figures that should be reduced to make room for larger, higher-resolution images. Larger fonts should be used for annotations of the images so that they are easier to read. The data appears to be very high quality, but it is presented at a size and resolution that don't do it justice.

      Response: We thank the referee for his/ her helpful comments. In response to the referee’s comment, we have carefully re-arranged all figures and now provide larger, high-resolution images.

      Reviewer 2 (Recommendations For The Authors):

      Major points

      (1) Figure 1 - While there is a rationale for employing a cocktail of drugs to interfere with actin dynamics, it would be highly informative to determine the effect of these modulators in isolation. This is important, since in their previous publication (Soykan et al Neuron 2017 93:854) the authors demonstrated that latrunculin had no effect, while jasplakinolide accelerated endocytosis of originating purely from Y-27362 and ROCK kinase inhibition, rather than destabilisation/stabilisation of actin. It will be key to dissect this by examining the effect on endocytosis of both 1) a cocktail of latrunculin/jasplakinolide and 2) Y-27362 alone.

      Response: We thank the referee for highlighting this interesting point. We have now experimentally addressed the effect of latrunculin (L), jasplakinolide (J) and the ROCK inhibitor Y-27362 (Y) either alone or in combination on the kinetics of synaptic vesicle (SV) endocytosis(new Fig. 1-Supplement 1C,D). We now demonstrate that application of the ROCK inhibitor Y-27362 or the combination of latrunculin (L) and jasplakinolide (J) have no effect on Syph-pH endocytosis. Combined use of jasplakinolide (J) and the ROCK inhibitor Y-27362 (Y) has a small phenotype. In contrast, a mix of all three inhibitors (JYL) potently impairs endocytosis kinetics at hippocampal synapses. These data demonstrate that actin dynamics are required for SV endocytosis, while ROCK inhibition alone does not appear to impair endocytosis kinetics. We note that our data are in line with a study by Ann Saal et al (2020) who reported a lack of effect of ROCK inhibition on the kinetics of Synaptotagmin1-CypHer retrieval.

      (2) Figure 1 - There are clear effects on the retrieval of pHluorin reporters and also endogenous vGAT in the presence of disruptors of actin function. However, there was no assessment of the impact of these interventions on either neurotransmitter release or SV fusion (with the exception of 1 condition with one stimulus train (Fig S1D), and the effect of Rac modulation in Fig S6F). As quoted by the authors, previous studies using knockout of beta- or gamma-actin have shown a profound effect on these parameters in hippocampal neurons, which has the potential to impact the speed and extent of compensatory endocytosis. The authors will already have this data from the use of the two reporters (pHluorn and GAT-cypHer), and it is important to include this to allow interpretation of the effect on endocytosis observed.

      Response: We agree with the referee that this is an important point that we have tackled experimentally using vGAT-CypHer and synapto-pHluorin responses as measures. In the new Fig. 1-Supplement 1, Fig. 5- Supplement 1, and Fig.6 -Supplement 1 of our revised manuscript, we show that SV exocytosis is largely unaffected by any of the applied manipulations of actin function.<br /> Specifically, we have added surface normalized data as a surrogate measure for exocytosis for the following:

      • JLY treatment monitored by Syph-pH (Figure 1-Supplement 1A) and vGAT-CypHer (Figure 1-Supplement 1B),

      • shCTR/shmDia1 (transfected) assayed via Syph-pH (Figure 1-Supplement 1G),

      • shCTR/shmDia1/shmDia1+3 assayed via vGLUT1-pH (40AP: Figure 1-Supplement 1J; 80AP: Figure 1-Supplement 1L),

      • shCTR/shmDia1+3 (transduced) assayed by vGAT-CypHer (Figure 1-Supplement 1M),

      • IMM treatment monitored by vGLUT1-pH (Figure 1-Supplement 1O),

      • RhoA/B WT/DN overexpression monitored by Syph-pH (Figure 5-Supplement 1B),

      • shCTR/shRhoA+B (transfected) monitored via Syph-pH (Figure 5-Supplement 1D),

      • shCTR/shmDia1+3 +/- EHT 1864 (Rac Inhibitor) assayed by vGAT-CypHer (Figure 6-Supplement 1D),

      • shCTR/shmDia1+3 +/- Rac1-CA/DN assayed by Syph-pH (Figure 6-Supplement 1F).

      The lack of effect of these manipulations on exocytic SV fusion is thus distinct from the effects of complete abrogation of actin expression in beta- or gamma-actin knockout studies reported by the LingGang Wu laboratory (Neuron 2016) as the referee also noted.

      (3) Figure 3H, 3K, 4C, 4F - It is unclear how the values on the Y-axis were calculated. Regardless, to confirm that there is a specific increase in presynaptic mDia1/actin, the equivalent values for Homer/mDia1 should be presented (with Basson/Homer as a negative control). Without this, it is difficult to argue for a specific enrichment of mDia1/actin at the presynapse. The CRISPR experiments help with this interpretation (Fig 4G-I), however, inclusion of the Homer/mDia1 STED data would strengthen it greatly.

      Response: We apologize if the description has been unclear. We essentially have followed the same type of analysis as recently described by Bolz et al (2023). In brief, the rationale for quantifying presynaptic protein levels of interests is as follows: The presynaptic area was defined by the normalized distribution curve of Bassoon, i.e. area between 151.37 and -37.84 nm as marked by purple shading with a cutoff set where Bassoon and Homer1 distributions overlap (-37.84 nm) as shown in Figure 3Supplement 1H (pasted below). The individual synaptic line profiles, e.g. of mDia1 were integrated to yield presynaptic (between 151.37 and -37.84 nm (purple in the graph) vs. postsynaptic levels (from - 56.76 to -245.97 nm (green shaded area). new Figure 3-Supplement 1H-J

      Author response image 1.

      Based on this analysis postsynaptic mDia1 levels were also elevated upon Dynasore treatment (new Figure 3-Supplement 1I). In spite of this and consistent with the fact that the majority of mDia1 is localized at the presynapse, we found that postsynaptic F-actin levels were unchanged in mDia1/3depleted neurons (p = 0.0966; One sample t-test) (new Figure 4-Supplement 1E,F). new Figure 4 – Supplement 1E,F

      Author response image 2.

      Moreover, we also conducted further analysis with respect to possible effects of Dynasore on synaptic architecture in general. Neither presynaptic Bassoon nor postsynaptic Homer1 levels were significantly altered by Dynasore treatment (new Figure 3–Supplement 1J).

      (4) Figure 4J - The rescue of the pHlourin response by jasplakinolide is difficult to interpret when considering previous work from the same authors. In their 2017 publication (Soykan et al Neuron 2017 93:854), they revealed that the drug accelerated the pHluorin response, whereas now they demonstrate no effect in the control condition. If the drug does accelerate endocytosis, then it may be working via a different mechanism to restore endocytosis in mDia1/3 knockdown neurons.

      Response: The referee is correct. The very mild acceleration of endocytosis in the presence of jasplakinolide can be observed using synaptophysin-pHluorin as a reporter under moderate mediumfrequency stimulation at 10Hz for 5 s (i.e. 50 APs). In the present dataset using a different pHluorin reporter (i.e. vGLUT1-pHluorin) that tends to yield faster endocytic responses (as noted before by the Ryan lab) and using a high frequency stimulus (20Hz) we fail to observe a significant effect. While this cannot be excluded, we would be reluctant to conclude that these differences indicate distinct mechanisms of jasplakinolide action. Alternatively, actin may be of particular importance under conditions of high-frequency stimulation.

      In this regard, the conclusions from the pHluorin experiment would be greatly strengthened by demonstrating that jasplakinolide corrects the reduction of presynaptic actin in mDia1/3 knockdown synapses observed in figures 4E-I.

      Response: As demonstrated in Figure 4-Supplement 1G and in support of a common mechanism of action, we find that application of jasplakinolide rescues reduced presynaptic actin levels in mDia1/3depleted neurons. The respective data for presynaptic actin (normalized to shCTR + DMSO set to 100) are: shCTR + DMSO = 100 ± 6.3; shmDia1+3 + DMSO = 47.7 ± 4.3; shCTR + Jasp = 150.6 ± 11.9; shmDia1+3 + Jasp = 94.3 ± 11.5. These data are now also quoted in the revised manuscript text.

      Minor points

      (1) There is no rationale provided regarding why different stimulation protocols are sometimes used in the pHluorin/cypHer experiments. In most cases it is 200 APs (40 Hz), however, in some cases, it is 40 APs or 80 APs. Can the authors explain why they used these different protocols?

      Response: The referee noted this correctly. This in part reflects the history of the project, in which initial datasets were acquired using 200 AP trains using pHluorin reporters. To probe whether the phenotypic effects induced by actin perturbations, were robust over different stimulation paradigms and optical reporters, additional data using either 40 or 80 AP trains as well as experiments capitalizing on vGLUT1 or endogenous vGAT monitiored by pH-sensitive cypHer-labeled antibodies were conducted. We hope the referee agrees that these additional data add to the general importance of our study.

      (2) Figure 2 - The reduction in SV density in mDia1/3 knockdown neurons correlates with the results in Figures 1 and 7. However, a functional consequence of this reduction (change in size of RRP or neurotransmitter release, as stated above) would have increased the impact of these experiments.

      Response: We agree with the referee and will address this interesting possibility using electrophysiolgical recordings in future studies.

      (3) It appears the experimental n in Figure 2 is profiles, rather than experiments. This should be clarified, especially since there is no reference to how many times the experiments in Fig2E-G were performed.

      Response: This point has been clarified in the revised figure legend.

      (4) Figure 6 - The authors state that inhibition of Rac function either via a dominant negative mutant or an inhibitor increases the inhibition of endocytosis via knockdown of mDia1/3. However, both interventions inhibit endocytosis themselves in the control condition. It would be informative to see the full statistical analysis of this data since there does not appear to be a significant additive effect when comparing Rac inhibition with the additional knockdown of mDia1/3.

      Response: In our revised manuscript, we now provide the full statistical analysis in the revised Source Data Table for Figures 6G,H. We observe that Rac1-DN expression indeed further aggravates phenotypes elicited by depletion of mDia1+3, but not vice versa. We have modified the corresponding section in the results section of our revised manuscript accordingly.

      (5) Figure 7 - The increase in endosomes in mDia1/3 knockdown neurons is consistent with previous studies examining pharmacological inhibition of formins (Soykan et al Neuron 2017 93:854). However, it is noted that these structures were absent in the images shown in Figure 2. Similar to the previous point in figure 6, a full reporting of the significance of different conditions is important here, since it appears that the only difference between EHT1864 and its co-incubation with mDia1/3 knockdown neurons is in the number of ELVs (Fig 7H).

      Response: Similar to the example EM images shown in Figure 7, enlarged endocytic structures are also observed in shmDia1+3 depleted synapses shown in Figure 2. However, ELVs and membrane invaginations were not color-coded as the focus in figure 2 is on the reduction of the SV pool. To better illustrate this, we have chosen a more representative example of this phenotype in revised Figure 2.

      Moreover, we now provide the full statistical analysis of EM phenotypes in the revised Source Data Table for Figure 7. We find that Rac1 inhibition indeed significantly aggravates the effects of mDia1+3 loss with respect to the accumulation of membrane invaginations, while the effect on ELVs remains insignificant. However, accumulation of ELVs in the presence of the Rac1 inhibitor EHT1864 is further aggravated upon depletion of mDia1+3. We have modified the corresponding section in the results section of our revised manuscript accordingly.

      We speculate that Rac1 may thus predominantly act at the plasma membrane, whereas mDia1/3 may serve additional functions in SV reformation at the level of ELVs. Clearly, further studies would be needed to test this idea in the future.

    2. eLife assessment

      This manuscript provides convincing evidence for the involvement of membrane actin, and its regulatory proteins, mDia1/3, RhoA, and Rac1 in the mechanism of synaptic vesicle re-uptake (endocytosis). These important data fill a gap in the understanding of how the regulation of actin dynamics and endocytosis are linked. The manuscript will be of interest to all scientists working on cellular trafficking and membrane remodeling.

    3. Reviewer 1 Public Review:

      Summary:

      The authors set out to clarify the molecular mechanism of endocytosis (re-uptake) of synaptic vesicle (SV) membrane in the presynaptic terminal following release. They have examined the role of presynaptic actin, and of the actin regulatory proteins diaphanous-related formins ( mDia1/3), and Rho and Rac GTPases in controlling the endocytosis. They successfully show that presynaptic membrane-associated actin is required for normal SV endocytosis in the presynaptic terminal, and that the rate of endocytosis is increased by activation of mDia1/3. They show that RhoA activity and Rac1 activity act in a partially redundant and synergistic fashion together with mDia1/3 to regulate the rate of SV endocytosis. The work adds substantially to our understanding of the molecular mechanisms of SV endocytosis in the presynaptic terminal.

      Strengths:

      The authors use state-of-the-art optical recording of presynaptic endocytosis in primary hippocampal neurons, combined with well-executed genetic and pharmacological perturbations to document effects of alteration of actin polymerization on the rate of SV endocytosis. They show that removal of the short amino-terminal portion of mDia1 that associates with the membrane interrupts the association of mDia1 with membrane actin in the presynaptic terminal. They then use a wide variety of controlled perturbations, including genetic modification of the amount of mDia1/3 by knock-down and knockout, combined with inhibition of activity of RhoA and Rac1 by pharmacological agents, to document the quantitative importance of each agent, and their synergistic relationship in regulation of endocytosis.

      The analysis is augmented by ultrastructural analyses that demonstrate the quantitative changes in numbers of synaptic vesicles and in uncoated membrane invaginations that are predicted by the optical recordings.<br /> The manuscript is well-written and the data are clearly explained. Statistical analysis of the data is strengthened by the very large number of data points analyzed for each experiment.

      Weaknesses:

      There are no major weaknesses.

    4. Reviewer 2 Public Review:

      Summary:

      This manuscript expands previous work from the Haucke group which demonstrated the role of formins in synaptic vesicle endocytosis. The techniques used to address the research question are state-of-the-art. As stated above there is a significant advance in knowledge, with particular respect to Rho/Rac signalling.

      Strengths:

      The major strength of the work was to reveal new information regarding the control of both presynaptic actin dynamics and synaptic vesicle endocytosis via Rho/Rac cascades. In addition, there was further mechanistic insight regarding the specific function of mDia1/3. The methods used were state-of-the-art.

      Weaknesses:

      There are no major weaknesses.

    1. Author Response

      We thank all three Reviewers and the editors for the time and effort they put in reading and critiquing the manuscript. Our revised manuscript includes new data and analyses that address the original concerns. These include, 1) a new Supplemental Figure characterizing Cre expression and cellular phenotypes in the hippocampus, 2) new tables that give a more comprehensive picture of the EEG recordings and statistical analyses, 3) addition of whole cell electrophysiology data, and 4) rewriting to ensure that we do not state that either mTORC1 or mTORC2 hyperactivation is sufficient to cause epilepsy. We discuss the issue of statistical power to detect reduction in generalized seizure rate in the responses below. These suggestions and additions have improved the paper and we hope they will raise both significance and strength of support for the conclusions.

      Reviewer #1 (Public Review):

      Hyperactivation of mTOR signaling causes epilepsy. It has long been assumed that this occurs through overactivation of mTORC1, since treatment with the mTORC1 inhibitor rapamycin suppresses seizures in multiple animal models. However, the recent finding that genetic inhibition of mTORC1 via Raptor deletion did not stop seizures while inhibition of mTORC2 did, challenged this view (Chen et al, Nat Med, 2019). In the present study, the authors tested whether mTORC1 or mTORC2 inhibition alone was sufficient to block the disease phenotypes in a model of somatic Pten loss-of-function (a negative regulator of mTOR). They found that inactivation of either mTORC1 or mTORC2 alone normalized brain pathology but did not prevent seizures, whereas dual inactivation of mTORC1 and mTORC2 prevented seizures. As the functions of mTORC1 versus mTORC2 in epilepsy remain unclear, this study provides important insight into the roles of mTORC1 and mTORC2 in epilepsy caused by Pten loss and adds to the emerging body of evidence supporting a role for both complexes in the disease development.

      Strengths:

      The animal models and the experimental design employed in this study allow for a direct comparison between the effects of mTORC1, mTORC2, and mTORC1/mTORC2 inactivation (i.e., same animal background, same strategy and timing of gene inactivation, same brain region, etc.). Additionally, the conclusions on brain epileptic activity are supported by analysis of multiple EEG parameters, including seizure frequencies, sharp wave discharges, interictal spiking, and total power analyses.

      Weaknesses:

      (1) The sample size of the study is small and does not allow for the assessment of whether mTORC1 or mTORC2 inactivation reduces seizure frequency or incidence. This is a limitation of the study.

      We agree that this is a minor limitation of the present study, however, for several reasons we decided not to pursue this question by increasing the number of animals. First, we performed a power analysis of the existing data. This analysis showed that we would need to use 89 animals per group to detect a significant difference (0.8 Power, p= 0.05, Mann-Whitney test) in the frequency of generalized seizures in the Pten-Raptor group and 31 animals per group in the Pten-Rictor group versus Pten alone. It is simply not feasible to perform video-EEG monitoring on this many animals for a single study. Second, even if we did do enough experiments to detect a reduction in seizure frequency, it is clear that neither Rptor nor Rictor deletion provides the kind normalization in brain activity that we seek in a targeted treatment. Both Pten-Rptor and Pten-Rictor animals still have very frequent spike-wave events (Fig. 3D) and highly abnormal interictal EEGs (Fig. 4), suggesting that even if generalized seizures were reduced, epileptic brain activity persists. This is in contrast to the triple KO animals, which have no increase in SWD above control level and very normal interictal EEG.

      (2) The authors describe that they inactivated mTORC1 and mTORC2 in a new model of somatic Pten loss-of-function in the cortex. This is slightly misleading since Cre expression was found both in the cortex and the underlying hippocampus, as shown in Figure 1. Throughout the manuscript, they provide supporting histological data from the cortex. However, since Pten loss-of-function in the hippocampus can lead to hippocampal overgrowth and seizures, data showing the impact of the genetic rescue in the hippocampus would further strengthen the claim that neither mTORC1 nor mTORC2 inactivation prevents seizures.

      Thank you for pointing out this issue. Cre expression was observed in both the cortex and the dorsal hippocampus in most animals, and we agree that differences in cortical versus hippocampal mTOR signaling could have differential contributions to epilepsy. We initially focused our studies on the cortex because spike-and-wave discharge, the most frequent and fully penetrant EEG phenotype in our model, is associated with cortical dysfunction. In our revised submission we have included a new Figure that quantifies Cre expression in the hippocampal subfields, as well as pS6, pAkt and soma size. These new data show that the amount of Cre expression in the hippocampus is not related to the occurrence of generalized seizures. The pattern of cell size changes in hippocampal neurons is the same as observed in cortical neurons. The levels of pS6 and pAkt are not much changed in the hippocampus, likely due to the sparse Cre expression there. We interpret these findings as supporting the conclusion that the reason we do not see seizure prevention by mTORC1 or mTORC2 inactivation is not due to hippocampal-specific dysfunction.

      (3)Some of the methods for the EEG seizure analysis are unclear. The authors describe that for control and Pten-Raptor-Rictor LOF animals, all 10-second epochs in which signal amplitude exceeded 400 μV at two time-points at least 1 second apart were manually reviewed, whereas, for the Pten LOF, Pten-Raptor LOF, and Pten-Rictor LOF animals, at least 100 of the highest- amplitude traces were manually reviewed. Does this mean that not all flagged epochs were reviewed? This could potentially lead to missed seizures.

      We reviewed at least 48 hours of data from each animal manually. All seizures that were identified during manual review were also identified by the automated detection program. It is possible but unlikely that there are missed seizures in the remaining data. We have added these details to the Methods of the revised submission.

      (4) Additionally, the inclusion of how many consecutive hours were recorded among the ~150 hours of recording per animal would help readers with the interpretation of the data.

      Thank you for this recommendation. Our revised submission includes a table with more information about the EEG recordings in the revised version of the manuscript. The number of consecutive hours recorded varied because the wireless system depends on battery life, which was inconsistent, but each animal was recorded for at least 48 consecutive hours on at least two occasions.

      (5) Finally, it is surprising that mTORC2 inactivation completely rescued cortical thickness since such pathological phenotypes are thought to be conserved down the mTORC1 pathway. Additional comments on these findings in the Discussion would be interesting and useful to the readers.

      We agree that the relationship between mTORC2, cortical thickness, and growth in general is an interesting topic with conflicting results in the literature. We didn’t add anything to the Discussion along these lines because we are up against word limits, but comment here that soma size was increased 120% by Pten inactivation and partially normalized to a 60% increase from Controls by mTORC2 inactivation (Fig. 2C). We and others have previously shown that mTORC2 inactivation (Rictor deletion) in neurons reduces brain size, neuron soma size, and dendritic outgrowth (PMIDs: 36526374, 32125271, 23569215). In our revised submission we also include new data showing that the membrane capacitance of Pten-Ric LOF neurons is normal. Thus, we do not find it completely surprising that mTORC2 inactivation reduces the cortical thickness increase caused by Pten loss. There may still be a slight increase in cortical thickness in Pten-Rictor animals, but it is statistically indistinguishable from Controls.

      Reviewer #2 (Public Review):

      Summary:

      The study by Cullen et al presents intriguing data regarding the contribution of mTOR complex 1 (mTORC1) versus mTORC2 or both in Pten-null-induced macrocephaly and epileptiform activity. The role of mTORC2 in mTORopathies, and in particular Pten loss-off-function (LOF)-induced pathology and seizures, is understudied and controversial. In addition, recent data provided evidence against the role of mTORC1 in PtenLOF-induced seizures. To address these controversies and the contribution of these mTOR complexes in PtenLOF-induced pathology and seizures, the authors injected a AAV9-Cre into the cortex of conditional single, double, and triple transgenic mice at postnatal day 0 to remove Pten, Pten+Raptor or Rictor, and Pten+raptor+rictor. Raptor and Rictor are essentially binding partners of mTORC1 and mTORC2, respectively. One major finding is that despite preventing mild macrocephaly and increased cell size, Raptor knockout (KO, decreased mTORC1 activity) did not prevent the occurrence of seizures and the rate of SWD event, and aggravated seizure duration. Similarly, Rictor KO (decreased mTORC2 activity) partially prevented mild macrocephaly and increased cell size but did not prevent the occurrence of seizures and did not affect seizure duration. However, Rictor KO reduced the rate of SWD events. Finally, the pathology and seizure/SWD activity were fully prevented in the double KO. These data suggest the contribution of both increased mTORC1 and mTORC2 in the pathology and epileptic activity of Pten LOF mice, emphasizing the importance of blocking both complexes for seizure treatment. Whether these data apply to other mTORopathies due to Tsc1, Tsc2, mTOR, AKT or other gene variants remains to be examined.

      Strengths:

      The strengths are as follows: 1) they address an important and controversial question that has clinical application, 2) the study uses a reliable and relatively easy method to KO specific genes in cortical neurons, based on AAV9 injections in pups. 2) they perform careful video-EEG analyses correlated with some aspects of cellular pathology.

      Weaknesses:

      The study has nevertheless a few weaknesses: 1) the conclusions are perhaps a bit overstated. The data do not show that increased mTORC1 or mTORC2 are sufficient to cause epilepsy. However the data clearly show that both increased mTORC1 and mTORC2 activity contribute to the pathology and seizure activity and as such are necessary for seizures to occur.

      We agree that our findings do not directly show that either mTORC1 or mTORC2 hyperactivity are sufficient to cause seizures, as we do not individually hyperactivate each complex in the absence of any other manipulation. We interpreted our findings in this model as suggesting that either is sufficient based on the result that there is no epileptic activity when both are inactivated, and thus assume that there is not a third, mTOR-independent, mechanism that is contributing to epilepsy in Pten, Pten-Raptor, and Pten-Rictor animals. In addition, the histological data show that Raptor and Rictor loss each normalize activity through mTORC1 and mTORC2 respectively, suggesting that one in the absence of the other is sufficient. However, we agree that there could be other potential mTOR-independent pathways downstream of Pten loss that contribute to epilepsy. We have revised the manuscript to reflect this.

      (2) The data related to the EEG would benefit from having more mice. Adding more mice would have helped determine whether there was a decrease in seizure activity with the Rictor or Raptor KO.

      Please see response to Reviewer 1’s first Weakness.

      (3) It would have been interesting to examine the impact of mTORC2 and mTORC1 overexpression related to point #1 above.

      We are not sure that overexpression of individual components of mTORC1 or mTORC2 would result in their hyperactivation or lead to increases in downstream signaling. We believe that cleanly and directly hyperactivating mTORC1 or especially mTORC2 in vivo without affecting the other complex or other potential interacting pathways is a difficult task. Previous studies have used mTOR gain-of-function mutations as a means to selectively activate mTORC1 or pharmacological agents to selectively activate mTORC2, but it not clear to us that the former does not affect mTORC2 activity as well, or that the latter achieves activation of mTORC2 targets other than p-Akt 473, or that it is truly selective. We agree that these would be key experiments to further test the sufficiency hypothesis, but that the amount of work that would be required to perform them is more that what we can do in this Short Report.

      Reviewer #3 (Public Review):

      Summary: This study investigated the role of mTORC1 and 2 in a mouse model of developmental epilepsy which simulates epilepsy in cortical malformations. Given activation of genes such as PTEN activates TORC1, and this is considered to be excessive in cortical malformations, the authors asked whether inactivating mTORC1 and 2 would ameliorate the seizures and malformation in the mouse model. The work is highly significant because a new mouse model is used where Raptor and Rictor, which regulate mTORC1 and 2 respectively, were inactivated in one hemisphere of the cortex. The work is also significant because the deletion of both Raptor and Rictor improved the epilepsy and malformation. In the mouse model, the seizures were generalized or there were spike-wave discharges (SWD). They also examined the interictal EEG. The malformation was manifested by increased cortical thickness and soma size.

      Strengths: The presentation and writing are strong. The quality of data is strong. The data support the conclusions for the most part. The results are significant: Generalized seizures and SWDs were reduced when both Torc1 and 2 were inactivated but not when one was inactivated.

      Weaknesses: One of the limitations is that it is not clear whether the area of cortex where Raptor or Rictor were affected was the same in each animal.

      Our revised submission includes new data showing that the area of affected cortex and hippocampus are similar across groups. (Figure 1A and Supplementary Figure 1)

      Also, it is not clear which cortical cells were measured for soma size.

      Soma size was measured by dividing Nissl stain images into a 10 mm2 grid. The somas of all GFP-expressing cells fully within three randomly selected grid squares in Layer II/III were manually traced. Three sections per animal at approximately Bregma -1.6, -2,1, and -2.6 were used. As Cre expression was driven by the hSyn promoter these cells include both excitatory and inhibitory cortical neurons.

      Another limitation is that the hippocampus was affected as well as the cortex. One does not know the role of cortex vs. hippocampus. Any discussion about that would be good to add.

      See response to Reviewer 1’s second Weakness.

      It would also be useful to know if Raptor and Rictor are in glia, blood vessels, etc.

      Raptor and Rictor are thought to be ubiquitously active in mammalian cells including glia and endothelial cells. Previous studies have shown that mTOR manipulation can affect astrocyte function and blood vessel organization, however, our study induced gene knockout using an AAV that expressed Cre under control of the hSyn promoter, which has previously been shown to be selective for neurons. Manual assessment of Cre expression compared with DAPI, NeuN, and GFAP stains suggested that only neurons were affected.

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      In addition to the comments in the public review, it is recommended that the authors provide a more representative figure for p-Akt staining in the Pten LOF condition in Figure 1 D2. The current figure is not convincing.

      Thanks for the suggestion. We have replaced the images with zoomed in panels that beter demonstrate the difference.

      Additionally, in the last paragraph of the discussion, there is a reference error to an incorrect paper (reference 18) that should be corrected.

      Thanks, corrected.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      Comment 1: Some statements need clarifications or changes.

      (1) Abstract: "spontaneous seizures and epileptiform activity persisted despite mTORC1 or mTORC2 inactivation alone but inactivating both mTORC1 and mTORC2 normalized pathology." Did inactivation of one only also normalized the pathology? Did inactivating both normalized the seizures? Pathology is not equal to seizures.

      We have altered this statement to avoid ambiguity.

      (2) Abstract: "These results suggest that hyperactivity of both mTORC1 and mTORC2 are sufficient to cause epilepsy,". Based on the abstract, it is not clear that it is sufficient. It is necessary.

      We have altered this statement by removing the term “sufficient.”

      (3) "Thus, there is strong evidence that hyperactivation of mTORC1 downstream of PTEN disruption causes the macrocephaly, epilepsy, early mortality, and synaptic dysregulation observed in humans and model organisms [17]" I would suggest adding that the strongest evidence is that mTOR GOF mutations lead to the same pathology and epilepsy, suggesting mTORC1 is sufficient. The other findings suggest that it is necessary.

      Unless we misunderstand the Reviewer’s point, we believe this viewpoint is already encompassed by the proceeding text that “These phenotypes resemble those observed in models of mTORC1- specific hyperactivation.”

      (4) Introduction (end): "suggesting that hyperactivity of either complex can lead to neuronal hyperexcitability and epilepsy".

      Comment 2: I do not agree with the title based on comment 1 above. You did not provide evidence that the mTORCs cause seizures. Your data suggest that they are necessary for seizures or contribute to seizures, but there is no evidence that mTORC2 can induce seizure.

      We softened the title by replacing “cause” with “mediate.”

      Comment 3: Fig. 1B. Could you beter describe the affected regions. I can see other regions than just the cortex and hippocampus.

      Almost all affected cell bodies were in the cortex and hippocampus. The virus in the image is cell-filling and as such projections from affected neurons throughout the brain can also be seen. We have clarified this in the figure legend.

      Comment 4: I feel unease about the number of animals recorded for EEG to assess seizure frequency. There is not enough power to draw clear conclusions. So, please make sure to not oversell your findings since it is all-or-nothing data (seizure or no seizure) in this case and the seizure frequency could very well be decreased with single mTOR LOF, but it is impossible to conclude. Maybe discuss this limitation of your study.

      We have addressed this in the public comments response.

      Minor:

      (1) Pten LOF: define the abbreviation.

      Done

      (2) Make sure that gene name in mice are not capitalized and italicized.

      OK

      (3) Fig 1C: could you specify in the results where the analysis was done.

      Detail added to Methods (to keep Results concise for word limit)

      (4) In the subtitle: "Concurrent mTORC1/2 inactivation, but neither alone, rescues epilepsy and interictal EEG abnormalities in focal Pten LOF". Replace "rescues" but prevents. This is not a rescue experiment since the LOF is done at the same time.

      OK

      (5) "GS did not appear to be correlated with mTOR pathway activity (Supplementary Figure 2)." Please can you do proper correlation analysis, by plotting all the values as a function of seizure frequency independent of the condition? There is also no correlation between pAKt and seizures.

      Here are those data in Author response image 1. They are now part of Supplementary Figure 2.

      Author response image 1.

      Reviewer #3 (Recommendations For The Authors):

      Figures 1 D, and E show images that are too small to judge. Where are the layers? Please add marks.

      We replaced these images with larger zoomed in images to show group differences more clearly. The images no longer show multiple differentiable cortical layers.

      If Fig 1 characterizes the model, where is the seizure data? When did they start? Where did they start? Was the focus of the cortical area affected by PTEN loss of function?

      Updated figure name to reflect content. Information about the seizure phenotypes is included in Figure 3.

      Figure 2 The font size for the calibration is too small. The correlations are hard to see. Colors are not easy to discriminate.

      We edited the figure to correct these problems.

      Figure 3 shows a clear effect on generalized seizures but the text of the Results does not reflect that.

      We wanted to be cautious about interpreting these data based on the issue raised by other reviewers that they are underpowered to detect seizure reduction in the Pten-Raptor and Pten-Rictor groups. We have updated the language to atempt to strike a beter balance between over- and under-interpretation. We also performed an additional analysis of the occurrence of generalized seizures to emphasize that only Control and PtRapRic animals have significantly lower seizure occurrence that Pten LOF mice (Fig 3C).

      For interictal power, was the same behavioral state chosen? Was a particular band affected?

      Epochs to be analyzed were selected automatically and were agnostic to behavioral state. Band-specific effects are outlined in Figure 4B and Table [2].

      There is no information about whether the model exhibits altered sleep, food intake, weight, etc.

      We didn’t collect information on food intake. It would be possible to look at sleep from the EEG, but that is not something that we are prepared to do at this point. Weight at endpoint was not different between genotypes but we did not collect longitudinal data on weight.

      Were the sexes different?

      Included in new Table [1]

      Where were EEG electrodes and were they subdural or not?

      Additional detail on this has been added to Methods. The screws are placed in the skull but above the dura.

      How long were continuous EEG records- the method just says 150 hr. per mouse in total.

      Included in new Table [1]

      The statistics don't discuss power, normality, whether variance was checked to ensure it did not differ significantly between groups, or whether data are mean +- sem or sd. For ANOVAs, were there multifactorial comparisons and what were F, df, and p values? Exact p for post hoc tests?

      We have added a new table (Table [3]) that gives information on the exact test used, F, p values, and exact p for post hoc tests. Information regarding power, normality, variance, post- tests and multiple comparisons corrections have been added to Methods section “Statistical Analysis.”

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Visual Perceptual Learning (VPL) results in varying degrees of generalization to tasks or stimuli not seen during training. The question of which stimulus or task features predict whether learning will transfer to a different perceptual task has long been central in the field of perceptual learning, with numerous theories proposed to address it. This paper introduces a novel framework for understanding generalization in VPL, focusing on the form invariants of the training stimulus. Contrary to a previously proposed theory that task difficulty predicts the extent of generalization - suggesting that more challenging tasks yield less transfer to other tasks or stimuli - this paper offers an alternative perspective. It introduces the concept of task invariants and investigates how the structural stability of these invariants affects VPL and its generalization. The study finds that tasks with high-stability invariants are learned more quickly. However, training with low-stability invariants leads to greater generalization to tasks with higher stability, but not the reverse. This indicates that, at least based on the experiments in this paper, an easier training task results in less generalization, challenging previous theories that focus on task difficulty (or precision). Instead, this paper posits that the structural stability of stimulus or task invariants is the key factor in explaining VPL generalization across different tasks

      Strengths:

      • The paper effectively demonstrates that the difficulty of a perceptual task does not necessarily correlate with its learning generalization to other tasks, challenging previous theories in the field of Visual Perceptual Learning. Instead, it proposes a significant and novel approach, suggesting that the form invariants of training stimuli are more reliable predictors of learning generalization. The results consistently bolster this theory, underlining the role of invariant stability in forecasting the extent of VPL generalization across different tasks.

      • The experiments conducted in the study are thoughtfully designed and provide robust support for the central claim about the significance of form invariants in VPL generalization.

      Weaknesses:

      • The paper assumes a considerable familiarity with the Erlangen program and the definitions of invariants and their structural stability, potentially alienating readers who are not versed in these concepts. This assumption may hinder the understanding of the paper's theoretical rationale and the selection of stimuli for the experiments, particularly for those unfamiliar with the Erlangen program's application in psychophysics. A brief introduction to these key concepts would greatly enhance the paper's accessibility. The justification for the chosen stimuli and the design of the three experiments could be more thoroughly articulated.

      Response: We appreciate the reviewer's feedback regarding the accessibility of our paper. In response to this feedback, we plan to enhance the introduction section of our paper to provide a concise yet comprehensive overview of the key concepts of Erlangen program. Additionally, we will provide a more thorough justification for the selection of stimuli and the experimental design in our revised version, ensuring that readers understand the rationale behind our choices.

      • The paper does not clearly articulate how its proposed theory can be integrated with existing observations in the field of VPL. While it acknowledges previous theories on VPL generalization, the paper falls short in explaining how its framework might apply to classical tasks and stimuli that have been widely used in the VPL literature, such as orientation or motion discrimination with Gabors, vernier acuity, etc. It also does not provide insight into the application of this framework to more naturalistic tasks or stimuli. If the stability of invariants is a key factor in predicting a task's generalization potential, the paper should elucidate how to define the stability of new stimuli or tasks. This issue ties back to the earlier mentioned weakness: namely, the absence of a clear explanation of the Erlangen program and its relevant concepts.

      Response: Thanks for highlighting the need for better integration of our proposed theory with existing observations in the field of VPL. Unfortunately, the theoretical framework proposed in our study is based on the Klein’s Erlangen program and is only applicable to geometric shape stimuli. For VPL studies using stimuli and paradigms that are completely unrelated to geometric transformations (such as motion discrimination with Gabors or random dots, vernier acuity, spatial frequency discrimination, contrast detection or discrimination, etc.), our proposed theory does not apply. Some stimuli employed by VPL studies can be classified into certain geometric invariants. For instance, orientation discrimination with Gabors (Dosher & Lu, 2005) and texture discrimination task (F. Wang et al., 2016) both belong to tasks involving Euclidean invariants, and circle versus square discrimination (Kraft et al., 2010) belongs to tasks involving affine invariance. However, these studies do not simultaneously involve multiple geometric invariants of varying levels stability, and thus cannot be directly compared with our research. It is worth noting that while the Klein’s hierarchy of geometries, which our study focuses on, is rarely mentioned in the field of VPL, it does have connections with concepts such as 'global/local', 'coarse/fine', 'easy/difficulty', 'complex/simple': more stable invariants are closer to 'global', 'coarse', 'easy', 'complex', while less stable invariants are closer to 'local', 'fine', 'difficulty', 'simple'. Importantly, several VPL studies have found ‘fine-to-coarse’ or ‘local-to-global’ asymmetric transfer (Chang et al., 2014; N. Chen et al., 2016; Dosher & Lu, 2005), which seems consistent with the results of our study.

      In the introduction section of our revised version and subsequent full author response, we will provide a clear explanation of the Erlangen program and elucidate how to define the stability of new stimuli or tasks. In the discussion section of our revised version, we will compare our results to other studies concerned with the generalization of perceptual learning and speculate on how our proposed theory fit with existing observations in the field of VPL.

      • The paper does not convincingly establish the necessity of its introduced concept of invariant stability for interpreting the presented data. For instance, consider an alternative explanation: performing in the collinearity task requires orientation invariance. Therefore, it's straightforward that learning the collinearity task doesn't aid in performing the other two tasks (parallelism and orientation), which do require orientation estimation. Interestingly, orientation invariance is more characteristic of higher visual areas, which, consistent with the Reverse Hierarchy Theory, are engaged more rapidly in learning compared to lower visual areas. This simpler explanation, grounded in established concepts of VPL and the tuning properties of neurons across the visual cortex, can account for the observed effects, at least in one scenario. This approach has previously been used/proposed to explain VPL generalization, as seen in (Chowdhury and DeAngelis, Neuron, 2008), (Liu and Pack, Neuron, 2017), and (Bakhtiari et al., JoV, 2020). The question then is: how does the concept of invariant stability provide additional insights beyond this simpler explanation?

      Response: We appreciate the alternative explanation proposed by the reviewer and agree that it presents a valid perspective grounded in established concepts of VPL and neural tuning properties. However, performing in the collinearity and parallelism tasks both require orientation invariance. While utilizing the orientation invariance, as proposed by the reviewer, can explain the lack of transfer from collinearity or parallelism to orientation task, it cannot explain why collinearity does not transfer to parallelism.

      As stated in the response to the previous review, in the revised discussion section, we will compare our study with other studies (including the three papers mentioned by the reviewer), aiming to clarify the necessity of the concept of invariant stability for interpreting the observed data and understanding the mechanisms underlying VPL generalization.

      • While the paper discusses the transfer of learning between tasks with varying levels of invariant stability, the mechanism of this transfer within each invariant condition remains unclear. A more detailed analysis would involve keeping the invariant's stability constant while altering a feature of the stimulus in the test condition. For example, in the VPL literature, one of the primary methods for testing generalization is examining transfer to a new stimulus location. The paper does not address the expected outcomes of location transfer in relation to the stability of the invariant. Moreover, in the affine and Euclidean conditions one could maintain consistent orientations for the distractors and targets during training, then switch them in the testing phase to assess transfer within the same level of invariant structural stability.

      Response: Thanks for raising the issue regarding the mechanism of transfer within each invariant conditions. We plan to design an additional experiment that is similar in paradigm to Experiment 2, aiming to examine how VPL generalizes to a new test location within a single invariant stability level.

      • In the section detailing the modeling experiment using deep neural networks (DNN), the takeaway was unclear. While it was interesting to observe that the DNN exhibited a generalization pattern across conditions similar to that seen in the human experiments, the claim made in the abstract and introduction that the model provides a 'mechanistic' explanation for the phenomenon seems overstated. The pattern of weight changes across layers, as depicted in Figure 7, does not conclusively explain the observed variability in generalizations. Furthermore, the substantial weight change observed in the first two layers during the orientation discrimination task is somewhat counterintuitive. Given that neurons in early layers typically have smaller receptive fields and narrower tunings, one would expect this to result in less transfer, not more.

      Response: We appreciate the reviewer's feedback regarding the clarity of our DNN modeling experiment. We acknowledge that while DNNs have been demonstrated to serve as models for visual systems as well as VPL, the claim that the model provides a ‘mechanistic’ explanation for the phenomenon still overstated. In our revised version,

      We will attempt a more detailed analysis of the DNN model while providing a more explicit explanation of the findings from the DNN modeling experiment, emphasizing its implications for understanding the observed variability in generalizations.

      Additionally, the substantial weight change observed in the first two layers during the orientation discrimination task is not contradictory to the theoretical framework we proposed, instead, it aligns with our speculation regarding the neural mechanisms of VPL for geometric invariants. Specifically, it suggests that invariants with lower stability rely more on the plasticity of lower-level brain areas, thus exhibiting poorer generalization performance to new locations or stimulus features within each invariant conditions. However, it does not imply that their learning effects cannot transfer to invariants with higher stability.

      Reviewer #2 (Public Review):

      The strengths of this paper are clear: The authors are asking a novel question about geometric representation that would be relevant to a broad audience. Their question has a clear grounding in pre-existing mathematical concepts, that, to my knowledge, have been only minimally explored in cognitive science. Moreover, the data themselves are quite striking, such that my only concern would be that the data seem almost too clean. It is hard to know what to make of that, however. From one perspective, this is even more reason the results should be publicly available. Yet I am of the (perhaps unorthodox) opinion that reviewers should voice these gut reactions, even if it does not influence the evaluation otherwise. Below I offer some more concrete comments:

      (1) The justification for the designs is not well explained. The authors simply tell the audience in a single sentence that they test projective, affine, and Euclidean geometry. But despite my familiarity with these terms -- familiarity that many readers may not have -- I still had to pause for a very long time to make sense of how these considerations led to the stimuli that were created. I think the authors must, for a point that is so central to the paper, thoroughly explain exactly why the stimuli were designed the way that they were and how these designs map onto the theoretical constructs being tested.

      (2) I wondered if the design in Experiment 1 was flawed in one small but critical way. The goal of the parallelism stimuli, I gathered, was to have a set of items that is not parallel to the other set of items. But in doing that, isn't the manipulation effectively the same as the manipulation in the orientation stimuli? Both functionally involve just rotating one set by a fixed amount. (Note: This does not seem to be a problem in Experiment 2, in which the conditions are more clearly delineated.)

      (3) I wondered if the results would hold up for stimuli that were more diverse. It seems that a determined experimenter could easily design an "adversarial" version of these experiments for which the results would be unlikely to replicate. For instance: In the orientation group in Experiment 1, what if the odd-one-out was rotated 90 degrees instead of 180 degrees? Intuitively, it seems like this trial type would now be much easier, and the pattern observed here would not hold up. If it did hold up, that would provide stronger support for the authors' theory.

      It is not enough, in my opinion, to simply have some confirmatory evidence of this theory. One would have to have thoroughly tested many possible ways that theory could fail. I'm unsure that enough has been done here to convince me that these ideas would hold up across a more diverse set of stimuli.

      Response: (1) We appreciate the reviewer’s feedback regarding the justification for our experimental designs. We recognize the importance of thoroughly explaining how our stimuli were designed and how these designs correspond to the theoretical constructs being tested. In our revised version, we will enhance the introduction of Erlangen program and provide a more detailed explanation of the rationale behind our stimulus designs, aiming to enhance the clarity and transparency of our experimental approach for readers who may not be familiar with these concepts.

      (2) We appreciate the reviewer’s insight into the design of Experiment 1 and the concern regarding the potential similarity between the parallelism and orientation stimuli manipulations.

      The parallelism and orientation stimuli in Experiment 1 were first used by Olson & Attneave (1970) to support line-based models of shape coding and then adapted to measure the relative salience of different geometric properties (Chen, 1986). In the parallelism stimuli, the odd quadrant differs from the rest in line slope, while in the orientation stimuli, in contrast, the odd quadrant contains exactly the same line segments as the rest but differs in direction pointed by the angles. The result, that the odd quadrant was detected much faster in the parallelism stimuli than in the orientation stimuli, can serve as evidence for line-based models of shape coding. However, according to Chen (1986, 2005), the idea of invariants over transformations suggests a new analysis of the data: in the parallelism stimuli, the fact that line segments share the same slope essentially implies that they are parallel, and the discrimination may be actually based on parallelism. Thus, the faster discrimination of the parallelism stimuli than that of the orientation stimuli may be explained in terms of relative superiority of parallelism over orientation of angles—a Euclidean property.

      The group of stimuli in Experiment 1 has been employed by several studies to investigate scientific questions related to the Klein’s hierarchy of geometries (L. Chen, 2005; Meng et al., 2019; B. Wang et al., n.d.). Due to historical inheritance, we adopted this set of stimuli and corresponding paradigm, despite their imperfect design.

      (3) Thanks for raising the important issue of stimulus diversity and the potential for "adversarial" versions of the experiments to challenge our findings. We acknowledge the validity of your concern and recognize the need to demonstrate the robustness of our results across a range of stimuli. We plan to design additional experiments to investigate the potential implications of varying stimulus characteristics, such as different rotation angles proposed by the reviewer, on the observed patterns of performance.

    2. eLife assessment

      This important study proposes a framework to understand and predict generalization in visual perceptual learning in humans based on form invariants. Using behavioral experiments in humans and by training deep networks, the authors offer evidence that the presence of stable invariants in a task leads to faster learning. However, this interpretation is promising but incomplete. It can be strengthened through clearer theoretical justification, additional experiments, and by rejecting alternate explanations.

    3. Reviewer #1 (Public Review):

      Summary:<br /> Visual Perceptual Learning (VPL) results in varying degrees of generalization to tasks or stimuli not seen during training. The question of which stimulus or task features predict whether learning will transfer to a different perceptual task has long been central in the field of perceptual learning, with numerous theories proposed to address it. This paper introduces a novel framework for understanding generalization in VPL, focusing on the form invariants of the training stimulus. Contrary to a previously proposed theory that task difficulty predicts the extent of generalization - suggesting that more challenging tasks yield less transfer to other tasks or stimuli - this paper offers an alternative perspective. It introduces the concept of task invariants and investigates how the structural stability of these invariants affects VPL and its generalization. The study finds that tasks with high-stability invariants are learned more quickly. However, training with low-stability invariants leads to greater generalization to tasks with higher stability, but not the reverse. This indicates that, at least based on the experiments in this paper, an easier training task results in less generalization, challenging previous theories that focus on task difficulty (or precision). Instead, this paper posits that the structural stability of stimulus or task invariants is the key factor in explaining VPL generalization across different tasks

      Strengths:<br /> - The paper effectively demonstrates that the difficulty of a perceptual task does not necessarily correlate with its learning generalization to other tasks, challenging previous theories in the field of Visual Perceptual Learning. Instead, it proposes a significant and novel approach, suggesting that the form invariants of training stimuli are more reliable predictors of learning generalization. The results consistently bolster this theory, underlining the role of invariant stability in forecasting the extent of VPL generalization across different tasks.

      - The experiments conducted in the study are thoughtfully designed and provide robust support for the central claim about the significance of form invariants in VPL generalization.

      Weaknesses:<br /> - The paper assumes a considerable familiarity with the Erlangen program and the definitions of invariants and their structural stability, potentially alienating readers who are not versed in these concepts. This assumption may hinder the understanding of the paper's theoretical rationale and the selection of stimuli for the experiments, particularly for those unfamiliar with the Erlangen program's application in psychophysics. A brief introduction to these key concepts would greatly enhance the paper's accessibility. The justification for the chosen stimuli and the design of the three experiments could be more thoroughly articulated.

      - The paper does not clearly articulate how its proposed theory can be integrated with existing observations in the field of VPL. While it acknowledges previous theories on VPL generalization, the paper falls short in explaining how its framework might apply to classical tasks and stimuli that have been widely used in the VPL literature, such as orientation or motion discrimination with Gabors, vernier acuity, etc. It also does not provide insight into the application of this framework to more naturalistic tasks or stimuli. If the stability of invariants is a key factor in predicting a task's generalization potential, the paper should elucidate how to define the stability of new stimuli or tasks. This issue ties back to the earlier mentioned weakness: namely, the absence of a clear explanation of the Erlangen program and its relevant concepts.

      - The paper does not convincingly establish the necessity of its introduced concept of invariant stability for interpreting the presented data. For instance, consider an alternative explanation: performing in the collinearity task requires orientation invariance. Therefore, it's straightforward that learning the collinearity task doesn't aid in performing the other two tasks (parallelism and orientation), which do require orientation estimation. Interestingly, orientation invariance is more characteristic of higher visual areas, which, consistent with the Reverse Hierarchy Theory, are engaged more rapidly in learning compared to lower visual areas. This simpler explanation, grounded in established concepts of VPL and the tuning properties of neurons across the visual cortex, can account for the observed effects, at least in one scenario. This approach has previously been used/proposed to explain VPL generalization, as seen in (Chowdhury and DeAngelis, Neuron, 2008), (Liu and Pack, Neuron, 2017), and (Bakhtiari et al., JoV, 2020). The question then is: how does the concept of invariant stability provide additional insights beyond this simpler explanation?

      - While the paper discusses the transfer of learning between tasks with varying levels of invariant stability, the mechanism of this transfer within each invariant condition remains unclear. A more detailed analysis would involve keeping the invariant's stability constant while altering a feature of the stimulus in the test condition. For example, in the VPL literature, one of the primary methods for testing generalization is examining transfer to a new stimulus location. The paper does not address the expected outcomes of location transfer in relation to the stability of the invariant. Moreover, in the affine and Euclidean conditions one could maintain consistent orientations for the distractors and targets during training, then switch them in the testing phase to assess transfer within the same level of invariant structural stability.

      - In the section detailing the modeling experiment using deep neural networks (DNN), the takeaway was unclear. While it was interesting to observe that the DNN exhibited a generalization pattern across conditions similar to that seen in the human experiments, the claim made in the abstract and introduction that the model provides a 'mechanistic' explanation for the phenomenon seems overstated. The pattern of weight changes across layers, as depicted in Figure 7, does not conclusively explain the observed variability in generalizations. Furthermore, the substantial weight change observed in the first two layers during the orientation discrimination task is somewhat counterintuitive. Given that neurons in early layers typically have smaller receptive fields and narrower tunings, one would expect this to result in less transfer, not more.

    4. Reviewer #2 (Public Review):

      The strengths of this paper are clear: The authors are asking a novel question about geometric representation that would be relevant to a broad audience. Their question has a clear grounding in pre-existing mathematical concepts, that, to my knowledge, have been only minimally explored in cognitive science. Moreover, the data themselves are quite striking, such that my only concern would be that the data seem almost *too* clean. It is hard to know what to make of that, however. From one perspective, this is even more reason the results should be publicly available. Yet I am of the (perhaps unorthodox) opinion that reviewers should voice these gut reactions, even if it does not influence the evaluation otherwise. Below I offer some more concrete comments:

      (1) The justification for the designs is not well explained. The authors simply tell the audience in a single sentence that they test projective, affine, and Euclidean geometry. But despite my familiarity with these terms -- familiarity that many readers may not have -- I still had to pause for a very long time to make sense of how these considerations led to the stimuli that were created. I think the authors must, for a point that is so central to the paper, thoroughly explain exactly why the stimuli were designed the way that they were and how these designs map onto the theoretical constructs being tested.

      (2) I wondered if the design in Experiment 1 was flawed in one small but critical way. The goal of the parallelism stimuli, I gathered, was to have a set of items that is not parallel to the other set of items. But in doing that, isn't the manipulation effectively the same as the manipulation in the orientation stimuli? Both functionally involve just rotating one set by a fixed amount. (Note: This does not seem to be a problem in Experiment 2, in which the conditions are more clearly delineated.)

      (3) I wondered if the results would hold up for stimuli that were more diverse. It seems that a determined experimenter could easily design an "adversarial" version of these experiments for which the results would be unlikely to replicate. For instance: In the orientation group in Experiment 1, what if the odd-one-out was rotated 90 degrees instead of 180 degrees? Intuitively, it seems like this trial type would now be much easier, and the pattern observed here would not hold up. If it did hold up, that would provide stronger support for the authors' theory.

      It is not enough, in my opinion, to simply have some confirmatory evidence of this theory. One would have to have thoroughly tested many possible ways that theory could fail. I'm unsure that enough has been done here to convince me that these ideas would hold up across a more diverse set of stimuli.

    1. Author Response

      We would like to thank the editors and reviewers who took their valuable time to evaluate the manuscript from various perspectives. We are delighted that our technique was found appealing to biologists and imaging technologists. However, we received several comments that the principles and effectiveness of our techniques are often vague and difficult to understand. They also pointed out that the explanations and representations for several figures were not appropriate. We will revise the manuscript to address these issues and make the manuscript more clear and rigorous.

    2. eLife assessment

      The important study established a large-scale objective and integrated multiple optical microscopy systems to demonstrate their potential for long-term imaging of the developmental process. The convincing imaging data cover a wide range of biological applications, such as organoids, mouse brains, and quail embryos, but enhancing image quality can further enhance the method's effectiveness. This work will appeal to biologists and imaging technologists focused on long-term imaging of large fields.

    3. Reviewer #1 (Public Review):

      Summary:<br /> The authors are trying to develop a microscopy system that generates data output exceeding the previous systems based on huge objectives.

      Strengths:<br /> They have accomplished building such a system, with a field of view of 1.5x1.0 cm2 and a resolution of up to 1.2 um. They have also demonstrated their system performance on samples such as organoids, brain sections, and embryos.

      Weaknesses:<br /> To be used as a volumetric imaging technique, the authors only showcase the implementation of multi-focal confocal sectioning. On the other hand, most of the real biological samples were acquired under wide-field illumination, and processed with so-called computational sectioning. Despite the claim that it improves the contrast, sometimes I felt that the images were oversharpened and the quantitative nature of these fluorescence images may be perturbed.

    4. Reviewer #2 (Public Review):

      Summary:<br /> This manuscript introduced a volumetric trans-scale imaging system with an ultra-large field-of-view (FOV) that enables simultaneous observation of millions of cellular dynamics in centimeter-wide 3D tissues and embryos. In terms of technique, this paper is just a minor improvement of the authors' previous work, which is a fluorescence imaging system working at visible wavelength region (https://www.nature.com/articles/s41598-021-95930-7).

      Strengths:<br /> In this study, the authors enhanced the system's resolution and sensitivity by increasing the numerical aperture (NA) of the lens. Furthermore, they achieved volumetric imaging by integrating optical sectioning and computational sectioning. This study encompasses a broad range of biological applications, including imaging and analysis of organoids, mouse brains, and quail embryos, respectively. Overall, this method is useful and versatile.

      Weaknesses:<br /> The unique application that only can be done by this high-throughput system remains vague. Meanwhile, there are also several outstanding issues in this paper, such as the lack of technical advances, unclear method details, and non-standardized figures.

    1. eLife assessment

      This study presents a valuable characterization of the biochemical consequences of a disease-associated point mutation in a nonmuscle actin. The study uses well-characterized in vitro assays to explore function. The data are convincing and should be helpful to others.

    2. Reviewer #2 (Public Review):

      Greve et al. investigated the effects of a disease associated gamma-actin mutation (E334Q) on actin filament polymerization, association of selected actin-binding proteins, and myosin activity. Recombinant wildtype and mutant proteins expressed in sf9 cells were found to be folded and stable, and the presence of the mutation altered a number of activities. Given the location of the mutation, it is not surprising that there are changes in polymerization and interactions with actin binding proteins.

      Comments on revised version:

      I have nothing to add and am satisfied with the rebuttal.

    3. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a useful characterization of the biochemical consequences of a disease-associated point mutation in a nonmuscle actin. The study uses solid and well-characterized in vitro assays to explore function. In some cases the statistical analyses are inadequate and several important in vitro assays are not employed.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths:

      The authors first perform several important controls to show that the expressed mutant actin is properly folded, and then show that the Arp2/3 complex behaves similarly with WT and mutant actin via a TIRF microscopy assay as well as a bulk pyrene-actin assay. A TIRF assay showed a small but significant reduction in the rate of elongation of the mutant actin suggesting only a mild polymerization defect.

      Based on in silico analysis of the close location of the actin point mutation and bound cofilin, cofilin was chosen for further investigation. Faster de novo nucleation by cofilin was observed with mutant actin. In contrast, the mutant actin was more slowly severed. Both effects favor the retention of filamentous mutant actin. In solution, the effect of cofilin concentration and pH was assessed for both WT and mutant actin filaments, with a more limited repertoire of conditions in a TIRF assay that directly showed slower severing of mutant actin.

      Lastly, the mutated residue in actin is predicted to interact with the cardiomyopathy loop in myosin and thus a standard in vitro motility assay with immobilized motors was used to show that non-muscle myosin 2A moved mutant actin more slowly, explained in part by a reduced affinity for the filament deduced from transient kinetic assays. By the same motility assay, myosin 5A also showed impaired interaction with the mutant filaments.

      The Discussion is interesting and concludes that the mutant actin will co-exist with WT actin in filaments, and will contribute to altered actin dynamics and poor interaction with relevant myosin motors in the cellular context. While not an exhaustive list of possible defects, this is a solid start to understanding how this mutation might trigger a disease phenotype.

      We thank the reviewer for the positive evaluation of our work.

      Weaknesses:

      • Potential assembly defects of the mutant actin could be more thoroughly investigated if the same experiment shown in Fig. 2 was repeated as a function of actin concentration, which would allow the rate of disassembly and the critical concentration to also be determined.

      The polymerization rate of individual filaments observed in TIRFM experiments showed only minor changes, as did the bulk-polymerization rate of 2 µM actin in pyrene-actin based experiments. Therefore, we decided not to perform additional pyrene-actin based experiments, in which we titrate the actin concentration, as we expect only very small changes to the critical concentration. Instead, we focused on the disturbed interaction with ABPs, as we assume these defects to be more relevant in an in vivo context. Using pyrene-based bulkexperiments, we did determine the rate of dilution-induced depolymerization of mutant filaments and compare them with the values determined for wt (Figure 5A, Table 1).

      • The more direct TIRF assay for cofilin severing was only performed at high cofilin concentration (100 nM). Lower concentrations of cofilin would also be informative, as well as directly examining by the TIRF assay the effect of cofilin on filaments composed of a 50:50 mixture of WT:mutant actin, the more relevant case for the cell.

      The TIRF assay for cofilin severing was performed initially over the cofilin concentration range from 20 to 250 nM. The results obtained in the presence of 100 nM cofilin allow a particularly informative depiction of the differences observed with mutant and WT actin. This applies to the image series showing the changes in filament length, cofilin clusters, and filament number as well as to the graphs showing time dependent changes in the number of filaments and total actin fluorescence. We have not included the results for a 50:50 mixture of WT:mutant actin because its attenuating effect is documented in several other experiments in the manuscript.

      • The more appropriate assay to determine the effect of the actin point mutation on class 5 myosin would be the inverted assay where myosin walks along single actin filaments adhered to a coverslip. This would allow an evaluation of class 5 myosin processivity on WT versus mutant actin that more closely reflects how Myo5 acts in cells, instead of the ensemble assay used appropriately for myosin 2.

      Our results with Myo5A show a less productive interaction with mutant actin filaments as indicated by a 1.7-fold reduction in the average sliding velocity and an increase in the optimal Myo5A-HMM surface density from 770 to 3100 molecules per µm2. These results indicate a reduction in binding affinity and coupling efficiency, with a likely impact on processivity. We expect only a small incremental gain in knowledge about the extent of changes by performing additional experiments with an inverted assay geometry, given that under physiological conditions the motor properties of Myo5A and other cytoskeletal myosins are modulated by other factors such as the presence of tropomyosin isoforms and other actin binding proteins.

      Reviewer #2 (Public Review):

      Greve et al. investigated the effects of a disease-associated gamma-actin mutation (E334Q) on actin filament polymerization, association of selected actin-binding proteins, and myosin activity. Recombinant wildtype and mutant proteins expressed in sf9 cells were found to be folded and stable, and the presence of the mutation altered a number of activities. Given the location of the mutation, it is not surprising that there are changes in polymerization and interactions with actin binding proteins. Nevertheless, it is important to quantify the effects of the mutation to better understand disease etiology.

      We thank the reviewer for the positive evaluation of our work.

      Some weaknesses were identified in the paper as discussed below.

      • Throughout the paper, the authors report average values and the standard-error-of-the-mean (SEM) for groups of three experiments. Reporting the SEM is not appropriate or useful for so few points, as it does not reflect the distribution of the data points. When only three points are available, it would be better to just show the three different points. Otherwise, plot the average and the range of the three points.

      We have gone through the manuscript carefully to correct any errors in the statistics, as explained below.

      Figure 1B, 5B, 5C, 5D, 8D, 9B, and 8 – figure supplement 2 all show the mean ± SD, as also correctly reported for Figure 8E and 8F in the figure legend. The statement, that these figures show the mean ± SEM was inaccurate. We corrected this mistake for all the listed figures. Furthermore, we now give the exact N for every experiment in the figure legend.

      Figure 2C, 2E, 2F, 4B, 5A, 6B-E showed the mean ± SEM. As suggested by the reviewer, we corrected the figures to show the mean ± SD.

      We still refer to the mean ± SEM in Figure 2B, where elongation rates for more than 100 filaments were recorded, and in Figure 8B, where sliding velocities for several thousand actin filaments were measured.

      • The description and characterization of the recombinant actin is incomplete. Please show gels of purified proteins. This is especially important with this preparation since the chymotrypsin step could result in internally cleaved proteins and altered properties, as shown by Ceron et al (2022). The authors should also comment on N-terminal acetylation of actin.

      We added an additional figure showing the purification strategy for the recombinant cytoskeletal γ –actin WT and p.E334Q protein with exemplary SDS-gels from different stages of purification (Figure 1 – figure supplement 1).

      In a previous paper, we reported the mass spectrometric analysis of the post-translational modifications of recombinant human β- and γ-cytoskeletal actin produced in Sf-9 cells. (Müller et al., 2013, Plos One). Recombinant actin showing complete N-terminal processing resulting in cleavage of the initial methionine and acetylation of the following aspartate (β-actin) or glutamate (γ-actin) is the predominant species in the analyzed preparations (> 95 %). While the recombinant actin in the 2013 study was produced tag-free and purified by affinity chromatography using the column-immobilized actin-binding domain of gelsolin (G4-G6), we have no reason to assume that the purification strategy using the actin-thymosin-β4 changes the efficiency of the N-terminal processing in Sf-9 cells. This is supported by our, yet unpublished, mass-spectrometric studies on recombinant human α-cardiac actin purified using the actin- thymosin-β4 fusion construct, which revealed actin species with an acetylated aspartate-3. This N-terminal modification of α-cardiac actin is catalyzed by the same actinspecific acetyltransferase (NAA80) as the acetylation of asparate-2 or glutamate-2 in cytoskeletal actin isoforms (Varland et al., 2019, Trends in Biochemical Sciences). Furthermore, additional studies that used the actin-thymosin-β4 fusion construct for the production of recombinant human cytoskeletal actin isoforms in Pichia pastoris reported robust N-terminal acetylation, when the actin was co-produced with NAA80 (In contrast to Sf-9 cells, NAA80 is not endogenously expressed in Pichia pastoris) (Hatano et al., 2020, Journal of Cell Science).

      We therefore, added the following statement to the manuscript:

      “Purification of the fusion protein by immobilized metal affinity chromatography, followed by chymotrypsin–mediated cleavage of C–terminal linker and tag sequences, results in homogeneous protein without non–native residues and native N-terminal processing, which includes cleavage of the initial methionine and acetylation of the following glutamate. “

      • The authors do not use the best technique to assess actin polymerization parameters. Although the TIRF assay is excellent for some measurements, it is not as good as the standard pyrene-actin assays that provide critical concentration, nucleation, and polymerization parameters. The authors use pyrene-actin in other parts of the paper, so it is not clear why they don't do the assays that are the standard in the actin field.

      The polymerization rate of individual filaments observed in TIRFM experiments showed only minor changes, as did the bulk-polymerization rate of 2 µM actin in pyrene-actin based experiments. Therefore, we decided not to perform additional pyrene-actin based experiments, in which we titrate the actin concentration, as we expect only very small changes to the critical concentration. Instead, we focused on the disturbed interaction with ABPs, as we assume these defects to be more relevant in an in vivo context. Using pyrene-based bulkexperiments, we did determine the rate of dilution-induced depolymerization of mutant filaments and compare them with the values determined for WT (Figure 5A, Table 1).

      • The authors' data suggest that, while the binding of cofilin-1 to both the WT and mutant actins remains similar, the major defect of the E334Q actin is that it is not as readily severed/disassembled by cofilin. What is missing is a direct measurement of the severing rate (number of breaks per second) as measured in TIRF.

      The severing rate as measured in TIRF is dependent on a number of parameters in a nonlinear manner. Therefore, we opted to show the combination of images directly showing the progress of the reaction and graphs summarizing the concomitant changes in cofilin clusters, actin filaments, actin-related fluorescence intensity and cofilin-related fluorescence intensity.

      • Figure 4 shows that the E334Q mutation increases rather than decreases the number of filaments that spontaneously assemble in the TIRF assay, but it is unclear how reduced severing would lead to increased filament numbers, rather, the opposite would be expected. A more straightforward approach would be to perform experiments where severing leads to more nuclei and therefore enhances the net bulk assembly rate.

      Figure 4 shows polymerization experiments that were started from ATP-G-actin in the presence of cofilin-1. These experiments show clearly that, especially at the higher cofilin-1 concentration (100 nM), the filament number is strongly increased in experiments performed with mutant actin. Inspection of the corresponding videos of these TIRFM experiments suggest that the increased number of filaments must result from an increased number of de novo nucleation events and not primarily from a mutation-induced change in severing susceptibility. The observation of a cofilin-stimulated increase in the de novo nucleation efficiency of actin was initially described by Andrianantoandro & Pollard (2006, Molecular Cell) using TIRFMbased experiments and is thought to arise from the stabilization of thermodynamically unfavorable actin dimers and trimers by cofilin. While the exact role of this cofilin-mediated effect in vivo is not completely clear, it is thought to contribute to cofilin-meditated actin dynamics synergistically with cofilin-mediated severing. It is therefore necessary, to clearly distinguish between the two effects of cofilin in vitro: stimulation of de novo nucleation and stimulation of filament disassembly. Our data indicated that the E334Q mutation affects these two effects differentially, as we state in the abstract and in the discussion.

      Abstract: “E334Q differentially affects cofilin-mediated actin dynamics by increasing the rate of cofilin-mediated de novo nucleation of actin filaments and decreasing the efficiency of cofilin-mediated filament severing.”

      Discussion: “Cofilin-mediated severing and nucleation were previously proposed to synergistically contribute to global actin turnover in cells (Andrianantoandro & Pollard, 2006; Du & Frieden, 1998). Our results show that the mutation affects these different cofilin functions in actin dynamics in opposite ways. Cofilin-mediated filament nucleation is more efficient for p.E334Q monomers, while cofilin-mediated severing of filaments containing p.E334Q is significantly reduced. The interaction of both actin monomers and actin filaments with ADF/cofilin proteins involves several distinct overlapping reactions. In the case of actin filaments, cofilin binding is followed by structural modification of the filament, severing and depolymerizing the filament (De La Cruz & Sept, 2010). Cofilin binding to monomeric actin is followed by the closure of the nucleotide cleft and the formation of stabilized “long-pitch” actin dimers, which stimulate nucleation (Andrianantoandro & Pollard, 2006)”.

      We interpret the reviewer's suggestion to mean that additional pyrene-actin-based bulk polymerization experiments should be performed to investigate the bulk-polymerization rate of ATP-G-actin in the presence of cofilin-1. In our understanding, these experiment would not provide additional value as 1) An observed increase of the bulk-polymerization rate cannot be directly correlated to a change of the efficiency of de novo nucleation or severing and 2) the effect of the mutation on cofilin-mediated filament disassembly was extensively analyzed in other experiments starting from preformed actin filaments. Moreover, our results are consistent with in silico modelling and normal mode analysis of the WT and mutant actin-cofilin complex.

      • Figure 5 A: in the pyrene disassembly assay, where actin is diluted below its critical concentration, cofilin enhances the rate of depolymerization by generating more free ends. The E334Q mutation leads to decreased cofilin-induced severing and therefore lower depolymerization. While these data seem convincing, it would be better to present them as an XY plot and fit the data to lines for comparison of the slopes.

      We now present the data as suggested by the reviewer. Furthermore, we determined the apparent second-order rate constant for cofilin-induced F-actin depolymerization (kc) to quantify the observed differences between WT, mutant and heterofilaments, as suggested by the reviewer.

      The paragraph describing these results was changed accordingly:

      “The observed rate constant values are linearly dependent on the concentration of cofilin–1 in the range 0–40 nM, with the slope corresponding to the apparent second– order rate constant (kC) for the cofilin-1 induced depolymerization of F–actin. In experiments performed with p.E334Q filaments, the value obtained for kC was 4.2-fold lower (0.81 × 10-4 ± 0.08 × 10-4 nM-1 s-1) compared to experiments with WT filaments (3.42 × 10-4 ± 0.22 × 10-4 nM-1 s-1). When heterofilaments were used, the effect of the mutation was reduced to a 2.2-fold difference compared to WT filaments (1.54 × 10-4 ± 0.11 × 10-4 nM-1 s-1).”

      • Figure 5 B and C: the cosedimentation data do not seem to help elucidate the underlying mechanism. While the authors report statistical significance, differences are small, especially for gel densitometry measurements where the error is high, which suggests that there may be little biological significance. Importantly, example gels from these experiments should be shown, if not the complete set included in the supplement. In B, the higher cofilin concentrations would be expected to stabilize the filaments and thus the curve should be Ushaped.

      We do not completely agree with the reviewer on this point. We think the co-sedimentation experiments are useful, as they show that cofilin-1 efficiently binds to mutant filaments, but is less efficient in stimulating disassembly in these endpoint-experiments. This information is not provided by the analysis of the effect of cofilin-1 on the bulk-depolymerization rate and adds to our understanding of the defect of the actin-cofilin interaction for the mutant.

      While we agree with the reviewer on the point that co-sedimentation experiments must be repeated several times to produce reliable data, we cannot fully grasp the reasoning behind the statement “While the authors report statistical significance, differences are small, especially for gel densitometry measurements where the error is high, which suggests that there may be little biological significance.”. We interpret this statement as advice to be cautious when extrapolating the observed perturbances of cofilin-mediated actin dynamics in vitro to the in vivo context. We think we are cautious about this throughout the manuscript.

      The author expects a U-shape curve, as high cofilin concentrations are reported to stabilize actin filaments by completely decorating the filament before severing-prone boundaries between cofilin-decorated and undecorated regions are generated. We have also performed these experiment with cytoskeletal β-actin and human cofilin-1 and never observed this U shape. This indicates that significant filament disassembly also happens at high cofilin concentrations, most likely directly after mixing of F-actin and cofilin. We cannot rule out that the incubation time plays an important role and that the U-shape only appears after longer incubation times. We also want to direct the reviewer to the publication “A Mechanism for Actin Filament Severing by Malaria Parasite Actin Depolymerizing Factor 1 via a Low Affinity Binding Interface” (Wong et al. 2013, JBC) in which comparable co-sedimentation experiments were performed (Figure 5E-G) with rabbit skeletal α-actin and human cofilin-1 and also no Ushaped curves were observed, even at higher molar excess of cofilin-1 compared to our experiments and with longer incubation times (1 hour vs. 10 minutes).

      We now included an exemplary gel showing co-sedimentation experiments performed with WT, mutant actin and different concentrations of cofilin at pH 7.8 in the manuscript (Figure 5 – figure supplement 2)

      • Figure 5 D: these data show that the binding of cofilin to WT and E334Q actin is approximately the same, with the mutant binding slightly more weakly. It would be clearer if the two plots were normalized to their respective plateaus since the difference in arbitrary units distracts from the conclusion of the figure. If the difference in the plateaus is meaningful, please explain.

      As suggested by the reviewer, we normalized the data for a better understanding of the message conveyed.

      • Figure 6: It is assumed that the authors are trying to show in this figure that cofilin binds both actins approximately the same but does not sever as readily for E334Q actin. The numerous parameters measured do not directly address what the authors are actually trying to show, which presumably is that the rate of severing is lower for E334Q than WT. It is therefore puzzling why no measurement of severing events per second per micron of actin in TIRF is made, which would give a more precise account of the underlying mechanism.

      The severing rate as measured in TIRF is dependent on a number of parameters in a nonlinear manner. Therefore, we opted to show the combination of images directly showing the progress of the reaction and graphs summarizing the concomitant changes in cofilin clusters, actin filaments, actin-related fluorescence intensity and cofilin-related fluorescence intensity.

      • Actin-activated steady-state ATPase data of the NM2A with mutant and WT actin would have been extremely useful and informative. The authors show the ability to make these types of measurements in the paper (NADH assay), and it is surprising that they are not included for assessing the myosin activity. It may be because of limited actin quantities. If this is the case, it should be indicated.

      Indeed, the measurement of the steady-state actin-activated ATPase with recombinant cytoskeletal actin is very material-intensive and therefore costly, as a complete titration of actin is required for the generation of meaningful data. Since the vast majority of our assays involving a myosin family member were performed with NM2A-HMM, we decided to perform a full actin titration of the steady-state actin-activated ATPase of NM2A-HMM with WT and mutant filaments. The results of these experiments are now shown in Figure 8C. The panel showing the results used for determining the dissociation rate constants (k-A) for the interaction of NM2C-2R with p.E334Q or WT γ –actin in the absence of nucleotide was moved to the supplement (Figure 8 – figure supplement 2).

      We added the following paragraph to the Material and Methods section concerning the Steady-State ATPase assay:

      “For measurements of the basal and actin–activated NM2A–HMM ATPase, 0.5 µM MLCKtreated HMM was used. Phalloidin–stabilized WT or mutant F-actin was added over the range of 0–25 µM. The change in absorbance at 340 nm due to oxidation of NADH was recorded in a Multiskan FC Microplate Photometer (Thermo Fisher Scientific, Waltham, MA, USA). The data were fitted to the Michaelis-Menten equation to obtain values for the actin concentration at half-maximal activation of ATP-turnover (Kapp) and for the maximum ATP-turnover at saturated actin concentration (kcat).”

      Furthermore, we added a description of the results of the experiments to the Results section of the manuscript:

      “Using a NADH-coupled enzymatic assay, we determined the ability of p.E334Q and WT filaments to activate the ATPase of NM2A-HMM over the range of 0-25 µM F-actin (Figure 8C). While we observed no significant difference in Kapp, indicated by the actin concentration at half-maximal activation, in experiments with p.E334Q filaments (2.89 ± 0.49 µM) and WT filaments (3.20 ± 0.74 µM), we observed a 28% slower maximal ATP turnover at saturating actin concentration (kcat) with p.E334Q filaments (0.076 ± 0.005 s-1 vs. 0.097 ± 0.002 s-1).”

      • (line 310) The authors state that they "noticed increased rapid dissociation and association events for E334Q filaments" in the motility assay. This observation motivates the authors to assess actin affinities of NM2A-HMM. Although differences in rigor and AM.ADP affinities are found between mutant and WT actins, the actin attachment lifetimes (many minutes) are unlikely to be related to the rapid association and dissociation event seen in the motility assay. Rather, this jiggling is more likely to be related to a lower duty ratio of the myosins, which appears to be the conclusion reached for the myosin-V data. These points should be clarified in the text.

      We changed the text in accordance with the reviewer’ suggestion. It reads now: Cytoskeletal –actin filaments move with an average sliding velocity of 195.3 ± 5.0 nm s–1 on lawns of surface immobilized NM2A–HMM molecules (Figure 8A, B). For NM2A-HMM densities below about 10,000 molecules per μm2, the average sliding speed for cytoskeletal actin filaments drops steeply (Hundt et al, 2016). Filaments formed by p.E334Q actin move 5fold slower, resulting in an observed average sliding velocity of 39.1 ± 3.2 nm/s. Filaments copolymerized from a 1:1 mixture of WT and p.E334Q actin move with an average sliding velocity of 131.2 ± 10 nm s–1 (Figure 8A, B). When equal densities of surface-attached WT and mutant filaments were used, we observed that the number of rapid dissociation and association events increased markedly for p.E334Q filaments (Figure 8 – video supplement 7– 9).

      Using a NADH-coupled enzymatic assay, we determined the ability of p.E334Q and WT filaments to activate the ATPase of NM2A-HMM over the range of 0-25 µM F-actin (Figure 8C). While we observed no significant difference in Kapp, indicated by the actin concentration at halfmaximal activation, in experiments with p.E334Q filaments (2.89 ± 0.49 µM) and WT filaments (3.20 ± 0.74 µM), we observed a 28% slower maximal ATP turnover at saturating actin concentration (kcat) with p.E334Q filaments (0.076 ± 0.005 s-1 vs. 0.097 ± 0.002 s-1). To investigate the impact of the mutation on actomyosin–affinity using transient–kinetic approaches, we determined the dissociation rate constants using a single–headed NM2A–2R construct (Figure 8D). …..

      • (line 327) The authors report that the 1/K1 value is unchanged. There are no descriptions of this experiment in the paper. I am assuming the authors measured the ATP-induced dissociation of actomyosin and determined ATP affinity (K1) from this experiment. If this is the case, they should describe the experiment and show the data, provide a second-order rate constate for ATP binding, and report the max rate of dissociation (k2). This is a kinetic experiment done frequently by this group, so the absence of these details is surprising.

      In the previous version of the manuscript, the method used to determine 1/K1 (ATP-induced dissociation of the actomyosin complex) was described in the Material and Methods paragraph “Transient kinetic analysis of the actomyosin complex” and the values obtained for 1/K1 were given in Table 1. We now included the experimental data as an additional figure in the manuscript (Figure 8 – figure supplement 3). Furthermore, we also give the maximal dissociation rate k+2 and the apparent second-order rate constant for ATP-binding (K1k+2) for the WT and mutant actomyosin complex in Table 1. Therefore, we changed the paragraph in the Results section concerning this experiment to:

      “The apparent ATP–affinity (1/K1), the maximal dissociation rate of NM2A from F-actin in the presence of ATP (k+2), and the apparent second-order rate constant of ATP binding (K1k+2) showed no significant differences for complexes formed between NM2A and WT or p.E334Q filaments (Table 1, Figure 8 – figure supplement 3).”

      and the section in the Material and Methods to:

      “The apparent ATP–affinity of the actomyosin complex was determined by mixing the apyrase–treated, pyrene–labeled, phalloidin–stabilized actomyosin complex with increasing concentrations of ATP at the stopped–flow system. Fitting an exponential function to the individual transients yields the ATP–dependent dissociation rate of NM2A–2R from F–actin (kobs). The kobs–values were plotted against the corresponding ATP concentrations and a hyperbola was fitted to the data. The fit yields the apparent ATP–affinity (1/K1) of the actomyosin complex and the maximal dissociation rate k+2.

      The apparent second–order rate constant for ATP binding (K1k+2) was determined by applying a linear fit to the data obtained at low ATP concentrations (0 – 25 µM).”

      For a better understanding of the numerous rate and equilibrium constants, we have now included a figure showing the kinetic reaction scheme of the myosin ATPase cycle (Figure 8 – figure supplement 1).

      Recommendations for the authors:

      Reviewer #1:

      • The subdomains of actin are mislabeled in Fig. 1A.

      The labeling of the subdomains has been corrected.

      • Additional experimental data addressing the 3 weaknesses noted in the public review would be informative but are not essential in my opinion. Examining the effect of cofilin on severing by the TIRF assay in more detail and using a processivity assay for myosin V (immobilized actin) would be the two aspects I would most value.

      The TIRF assay for cofilin severing was performed initially over the cofilin concentration range from 20 to 250 nM. The results obtained in the presence of 100 nM cofilin allow a particularly informative depiction of the differences observed with mutant and WT actin. This applies to the image series showing the changes in filament length, cofilin clusters, and filament number as well as to the graphs showing time dependent changes in the number of filaments and total actin fluorescence. We have not included the results for a 50:50 mixture of WT:mutant actin because its attenuating effect is documented in several other experiments in the manuscript.

      Our results with Myo5A show a less productive interaction with mutant actin filaments as indicated by a 1.7-fold reduction in the average sliding velocity and an increase in the optimal Myo5A-HMM surface density from 770 to 3100 molecules per µm2. These results indicate a reduction in binding affinity and coupling efficiency, with a likely impact on processivity. Given that Myo5A is only one of many cytoskeletal myosin motors and that the motor properties of all myosins are modulated by the presence of tropomyosin isoforms and other actin binding proteins, we expect only a small incremental gain in knowledge by performing additional experiments with an inverted assay geometry.

      Reviewer #2:

      • The authors should address the concerns regarding the statistical methodologies.

      We have gone through the manuscript carefully to correct any errors in the statistics, as explained below.

      Figure 1B, 5B, 5C, 5D, 8D, 9B, and 8 – figure supplement 2 all show the mean ± SD, as also correctly reported for Figure 8E and 8F in the figure legend. The statement, that these figures show the mean ± SEM was wrong and we corrected this mistake for all the listed figures. Furthermore, we now give the exact N for every experiment in the figure legend.

      Figure 2C, 2E, 2F, 4B, 5A, 6B-E indeed showed the mean ± SEM. As the reviewer rightly points out, this is not the appropriate way to deal with such sample sizes. We therefore corrected the figures to show the mean ± SD.

      We still refer to the mean ± SEM in Figure 2B, where elongation rates for more than 100 filaments were recorded, and in Figure 8B, where sliding velocities for several thousand actin filaments were measured.

      • The authors should present the actin titration of the steady state ATPase activity for at least one of the myosins, or preferably all of them.

      An actin titration of the steady state ATPase activity of NM-2A has been included in the revised version of the manuscript (Fig 8C).

      • The authors should consider the use of pyrene-actin in measuring the assembly/disassembly of actin.

      Values for the rate of actin assembly/disassembly measured with pyrene-actin are given in Table 1. Based on the small changes observed, we did not determine the critical actin concentration for the mutant construct.

    4. Reviewer #1 (Public Review):

      This paper is of importance to scientists interested in molecular mechanisms by which actin point mutations affect its function to ultimately lead to disease states. This work thoroughly characterizes the effect of the E334Q mutation in cytoplasmic gamma-actin on two binding partners: cofilin and myosin (non-muscle myosin 2 and myosin 5). Overall, the data showing effects on cofilin function and myosin binding are convincing and the experiments performed expertly using state-of-the art approaches. Additional binding partners of actin that were not examined here may also have altered function when interacting with the mutant actin.

      Comments on revised version:

      The authors seem to have done a pretty thorough job with the rebuttal.

    1. Author Response

      We thank both the editors and the Reviewers for their thoughtful comments and recommendations, that will certainly help us improve the manuscript. Below we address in a brief format some of the comments made, and then outline the changes to the manuscript that we plan to implement in the revision.

      We see three interrelated issues in the comments of the Reviewers:

      • the length and complexity of the manuscript;

      • the link to previously proposed formalisms;

      • the impact of adopting the proposed information-theoretic framework.

      With regard to all of these issues, we would first like to highlight that the overall goal of our effort was to integrate con tributions to understanding the mechanisms underlying cognitive control across multiple different disciplines, using the information theoretic framework as a common formalism, while respecting and building on prior efforts as much as possible. Accordingly, we sought to be as explicit as possible about how we bridge from prior work using information theory, as well as neural networks and dynamical systems theory, which contributed to length of the original manuscript. While we continue to consider this an important goal, we will do our best to shorten and clarify the main exposition by reorganizing the manuscript as suggested by Reviewer #1 (i.e., in a way that is similar to what we did in our previous Nature Physics paper on multitasking). Specifically, we will move a substantially greater amount of the bridging material to the Supple mentary Information (SI), including the detailed discussion of the Stroop task, and the description of the link to Koechlin & Summerfield’s [L1] information theory formalism. We will also now include an outline of the full model at the beginning of the manuscript, that includes control and learning, and then more succinctly describe simplifications that focus on specific issues and applications in the remainder of the document.

      Along similar lines, we will revise and harmonize our presentation of the formalism and notations, to make these more consistent, clearer and more concise throughout the document. Again, some of the inconsistencies in notation arose from our initial description of previous work, and in particular that of Koechlin & Summerfield[L1] that was an important inspiration for our work but that used slightly different notations. An important motivation for our introduction of new notation was that their formulation focused on the performance of a single task at a time, whereas a primary goal of our work was to extend the information theoretic treatment to simultaneous performance of multiple tasks. That is, in focusing on single tasks, Koechlin & Summerfield could refer to a task simply as a direct association between stimuli and responses, whereas we required a way of being able to refer to sets of tasks performed at once (”multitasks”), which in turn required specification of internal pathways. Moreover, they do not provide a mechanism to compute the conditional information Q(a|s) of a response/action s conditioned to a stimulus s does not provide a way to compute it explicitly. Our formalism instead provides a way to explicitly unpack this expression in terms of the efficacies –automatic (Eq. 5) or controlled (Eq. 15)– which can also account for the competition between different stimuli {s1, s2, . . . sn}. It also describes explicitly the competition between multiple tasks (Eq. 18, and Eq. 25 for multiple layers), because different ways of processing schemes for the same combinations of stimuli/responses can incur different levels of internal dependencies and thus require different control strategies.

      To mitigate any confusion over terminology we will, as noted above, move a detailed discussion of Koechlin & Summer- field’s formulation, and how it maps to the one we present, to the SI, while taking care to introduce ours clearly at the beginning of the main document, and use it consistently throughout the remainder of the document. We will also make an important distinction – between informational and cognitive costs – more clearly, that we did not do adequately in the original manuscript.

      Finally, to more clearly and concretely convey what we consider to be the most important contributions, we will restrict the number of examples we present to ones that relate most directly to the central points (e.g., the effect and limits of control in the presence of interference, and the differences in control strategy under limited temporal horizons). Accompanying our revision, we will also provide a full point-by-point response to the comments and questions raised by the Reviewers. We summarize some the key points we will address below.

      PRELIMINARY REPLY TO THE REPORT OF REVIEWER #1

      We want to thank the Reviewer for the time and effort put into reviewing our paper and constructive feedback that was provided. We also thank the Reviewer for recognizing the need for a clear computational account of how ”control” manages conflicts by scheduling tasks to be executed in parallel versus serially, and for the positive evaluation on our “efforts of the authors to give these intuitions a more concrete computational grounding.”. As noted in the general reply above, we regret the lack of clarity in several parts of the manuscript and in our introduction and use of the formalism. We consider the following to be the main points to be addressed:

      • the role of task graphs and their mapping to standard neural architectures

      • the description of entropy and related information-theoretic concepts;

      • confusing choice of symbols in our notation between stimuli/responses and serialization/reconfiguration costs;

      • missing definition of response time;

      Regarding the first part point, we acknowledge that the network architectures we focus on do not draw direct inspiration from conventional machine learning models. Instead, our approach is rooted in the longstanding tradition of using (often simpler, but also more readily interpretable) neural network models to address human cognitive function and how this may be implemented in the brain [L2]; and, in particular, the mechanisms underlying cognitive control (e.g., [L3, L4]). In this context, we emphasize that, for analytical clarity, we deliberately abstract away from many biological details, in an effort to identify those principles of function that are most relevant to cognitive function. Nevertheless, our network architecture is inspired by two concepts that are central to neurobiological mechanisms of control: inhibition and gain modulation. Specifi- cally, we incorporate mutual inhibition among neural processing units, a feature represented by the parameter β. This aspect of our model is consistent with biologically inspired frameworks of neural processing, such as those discussed by Munakata et al. (2011)[L5], reflecting the competitive dynamics observed in neural circuits. Moreover, we introduce the parameter ν to represent a strictly modulatory form of control, akin to the role of neuromodulators in the brain. This modulatory control adjusts the sensitivity of a node to differences among its inputs (e.g., Servan-Schreiber, Printz, & Cohen, (1990)[L6]; Aston-Jones & Cohen (2005)[L7]). Finally, as the Reviewer notes, additional hidden layers can improve expressivity in neural networks, enabling the efficient implementation of more complex tasks, and are a universal feature of biological and artificial neural systems. We thus examined multitasking capability under the assumption that multiple hidden layers are present in a network; irrespective of whether they are needed to implement the corresponding tasks.

      Regarding the second point, as noted above, we believe that the confusion arose from our review of the work by Koechlin & Summerfield. In their formalism, in which an action a is chosen (from a set of potential actions) with probability p(a), the cost of choosing that action is − log p(a). This is usually referred to as the information content or, alternatively, the localized entropy [L8]. As the Reviewer correctly observed, the canonical (Shannon) entropy is actually the expectation lEa[− log p(a)] over the localized entropies of a set of actions. In summarizing their formulation, we misleadingly stated that ”they used standard Shannon entropy formalism as a measure of the information required to select the action a.” We will now correct this to state: “[..] they used local entropy (− log p(a)) as a measure of the information required to select the action a, that can be treated as the cost of choosing that action.” We follow this formulation in our own, referring to informational cost as Ψ, and generalizing this to include cases in which more than one action may be chosen to perform at a time.

      Regarding the third point, the confusion is due to our use of the letters S and R for both the stimulus and response units (in Sec. II.B) and then serialization and reconstruction costs (in eqs 31-33). We will fix this by renaming the serialization and reconstruction costs more explicitly as S er and Rec.

      Finally, we realized we never explicitly stated the expression of the response time we used, but only pointed to it in the literature. In the manuscript we used the expression given in Eq. 53 of [L9], which provides response times as function of the error rates ER and the number of options .

      PRELIMINARY REPLY TO THE REPORT OF REVIEWER #2

      We want to thank the Reviewer for recognizing our effort to ”rigorously synthesize ideas about multi-tasking within an information-theoretic framework” and its potential. We also thank the Reviewer for the careful comments.

      To our best understanding, and similarly to Reviewer #1, the main comments of the Reviewer are on:

      • the length and density of the paper;

      • the presentation of the Koechlin & Summerfield’s formalism, and the mismatch/lack of clarity of ours in certain points;

      • the added value of the information theoretic formalism.

      Regarding the first two points, which are common to Reviewer #1, we plan to move a significant part of the manuscript to the Supplementary Information, both to improve readability and make the manuscript shorter, as well as to provide one consistent and cleaner formalism (in particular with regards to the typos and errors highlighted by the Reviewer). In par- ticular, with respect to the comment on Eq. 4-5-6, we will clarify that the probability p[ fi j] is the probability that a certain input dimension (i in this case) is selected by on node j to produce its response (averaged over the individual inputs in each input dimension). We will also take care to make sure that the definition and domain of the various probabilities and probability distributions we use are clearly delineated (e.g. where the costs computed for tasks and task pathways come from).

      Regarding the third point, we hope that our work offers value in at least two ways: i) it helps bring unity to ideas and descriptions about the capacity constraints associated with cognitive control that have previously been articulated in different forms (viz., neural networks, dynamical systems, and statistical mechanical accounts); and ii) doing so within an information theoretic framework not only lends rigor and precision to the formulation, but also allows us to cast the allocation of control in normative form – that is, as an optimization problem in which the agent seeks to minimize costs while maximizing gains. While we do not address specific empirical phenomena or datasets in the present treatment, we have done our best to provide examples showing that: a) our information theoretic formulation aligns with treatments using other formalisms that have been used to address empirical phenomena (e.g., with neural network models of the Stroop task); and b) our formulation can be used as a framework for providing a normative approach to widely studied empirical phenomena (e.g., the transition from control-dependent to automatic processing during skill acquisition) that, to date, have been addressed largely from a descriptive perspective; and that it can provide a formally rigorous approach to addressing such phenomena.

      [L1] E. Koechlin and C. Summerfield, Trends in cognitive sciences 11, 229 (2007).

      [L2] J. L. McClelland, D. E. Rumelhart, P. R. Group, et al., Explorations in the Microstructure of Cognition 2, 216 (1986).

      [L3] J. D. Cohen, K. Dunbar, and J. L. McClelland, Psychological Review 97, 332 (1990).

      [L4] E. K. Miller and J. D. Cohen, Annual review of neuroscience 24, 167 (2001).

      [L5] Y. Munakata, S. A. Herd, C. H. Chatham, B. E. Depue, M. T. Banich, and R. C. O’Reilly, Trends in cognitive sciences 15, 453 (2011).

      [L6] D. Servan-Schreiber, H. Printz, and J. D. Cohen, Science 249, 892 (1990).

      [L7] G. Aston-Jones and J. D. Cohen, Annu. Rev. Neurosci. 28, 403 (2005).

      [L8] T. F. Varley, Plos one 19, e0297128 (2024).

      [L9] T. McMillen and P. Holmes, Journal of Mathematical Psychology 50, 30 (2006).

    1. Author Response

      Reviewer #1 (Public Review):

      This study used a multi-day learning paradigm combined with fMRI to reveal neural changes reflecting the learning of new (arbitrary) shape-sound associations. In the scanner, the shapes and sounds are presented separately and together, both before and after learning. When they are presented together, they can be either consistent or inconsistent with the learned associations. The analyses focus on auditory and visual cortices, as well as the object-selective cortex (LOC) and anterior temporal lobe regions (temporal pole (TP) and perirhinal cortex (PRC)). Results revealed several learning-induced changes, particularly in the anterior temporal lobe regions. First, the LOC and PRC showed a reduced bias to shapes vs sounds (presented separately) after learning. Second, the TP responded more strongly to incongruent than congruent shape-sound pairs after learning. Third, the similarity of TP activity patterns to sounds and shapes (presented separately) was increased for non-matching shape-sound comparisons after learning. Fourth, when comparing the pattern similarity of individual features to combined shape-sound stimuli, the PRC showed a reduced bias towards visual features after learning. Finally, comparing patterns to combined shape-sound stimuli before and after learning revealed a reduced (and negative) similarity for incongruent combinations in PRC. These results are all interpreted as evidence for an explicit integrative code of newly learned multimodal objects, in which the whole is different from the sum of the parts.

      The study has many strengths. It addresses a fundamental question that is of broad interest, the learning paradigm is well-designed and controlled, and the stimuli are real 3D stimuli that participants interact with. The manuscript is well written and the figures are very informative, clearly illustrating the analyses performed.

      There are also some weaknesses. The sample size (N=17) is small for detecting the subtle effects of learning. Most of the statistical analyses are not corrected for multiple comparisons (ROIs), and the specificity of the key results to specific regions is also not tested. Furthermore, the evidence for an integrative representation is rather indirect, and alternative interpretations for these results are not considered.

      We thank the reviewer for their careful reading and the positive comments on our manuscript. As suggested, we have conducted additional analyses of theoretically-motivated ROIs and have found that temporal pole and perirhinal cortex are the only regions to show the key experience-dependent transformations. We are much more cautious with respect to multiple comparisons, and have removed a series of post hoc across-ROI comparisons that were irrelevant to the key questions of the present manuscript. The revised manuscript now includes much more discussion about alternative interpretations as suggested by the reviewer (and also by the other reviewers).

      Additionally, we looked into scanning more participants, but our scanner has since had a full upgrade and the sequence used in the current study is no longer supported by our scanner. However, we note that while most analyses contain 17 participants, we employed a within-subject learning design that is not typically used in fMRI experiments and increases our power to detect an effect. This is supported by the robust effect size of the behavioural data, whereby 17 out of 18 participants revealed a learning effect (Cohen’s D = 1.28) and which was replicated in a follow-up experiment with a larger sample size.

      We address the other reviewer comments point-by-point in the below.

      Reviewer #2 (Public Review):

      Li et al. used a four-day fMRI design to investigate how unimodal feature information is combined, integrated, or abstracted to form a multimodal object representation. The experimental question is of great interest and understanding how the human brain combines featural information to form complex representations is relevant for a wide range of researchers in neuroscience, cognitive science, and AI. While most fMRI research on object representations is limited to visual information, the authors examined how visual and auditory information is integrated to form a multimodal object representation. The experimental design is elegant and clever. Three visual shapes and three auditory sounds were used as the unimodal features; the visual shapes were used to create 3D-printed objects. On Day 1, the participants interacted with the 3D objects to learn the visual features, but the objects were not paired with the auditory features, which were played separately. On Day 2, participants were scanned with fMRI while they were exposed to the unimodal visual and auditory features as well as pairs of visual-auditory cues. On Day 3, participants again interacted with the 3D objects but now each was paired with one of the three sounds that played from an internal speaker. On Day 4, participants completed the same fMRI scanning runs they completed on Day 2, except now some visual-auditory feature pairs corresponded with Congruent (learned) objects, and some with Incongruent (unlearned) objects. Using the same fMRI design on Days 2 and 4 enables a well-controlled comparison between feature- and object-evoked neural representations before and after learning. The notable results corresponded to findings in the perirhinal cortex and temporal pole. The authors report (1) that a visual bias on Day 2 for unimodal features in the perirhinal cortex was attenuated after learning on Day 4, (2) a decreased univariate response to congruent vs. incongruent visual-auditory objects in the temporal pole on Day 4, (3) decreased pattern similarity between congruent vs. incongruent pairs of visual and auditory unimodal features in the temporal pole on Day 4, (4) in the perirhinal cortex, visual unimodal features on Day 2 do not correlate with their respective visual-auditory objects on Day 4, and (5) in the perirhinal cortex, multimodal object representations across Days 2 and 4 are uncorrelated for congruent objects and anticorrelated for incongruent. The authors claim that each of these results supports the theory that multimodal objects are represented in an "explicit integrative" code separate from feature representations. While these data are valuable and the results are interesting, the authors' claims are not well supported by their findings.

      We thank the reviewer for the careful reading of our manuscript and positive comments. Overall, we now stay closer to the data when describing the results and provide our interpretation of these results in the discussion section while remaining open to alternative interpretations (as also suggested by Reviewer 1).

      (1) In the introduction, the authors contrast two theories: (a) multimodal objects are represented in the co-activation of unimodal features, and (b) multimodal objects are represented in an explicit integrative code such that the whole is different than the sum of its parts. However, the distinction between these two theories is not straightforward. An explanation of what is precisely meant by "explicit" and "integrative" would clarify the authors' theoretical stance. Perhaps we can assume that an "explicit" representation is a new representation that is created to represent a multimodal object. What is meant by "integrative" is more ambiguous-unimodal features could be integrated within a representation in a manner that preserves the decodability of the unimodal features, or alternatively the multimodal representation could be completely abstracted away from the constituent features such that the features are no longer decodable. Even if the object representation is "explicit" and distinct from the unimodal feature representations, it can in theory still contain featural information, though perhaps warped or transformed. The authors do not clearly commit to a degree of featural abstraction in their theory of "explicit integrative" multimodal object representations which makes it difficult to assess the validity of their claims.

      Due to its ambiguity, we removed the term “explicit” and now make it clear that our central question was whether crossmodal object representations require only unimodal feature-level representations (e.g., frogs are created from only the combination of shape and sound) or whether crossmodal object representations also rely on an integrative code distinct from the unimodal features (e.g., there is something more to “frog” than its original shape and sound). We now clarify this in the revised manuscript.

      “One theoretical view from the cognitive sciences suggests that crossmodal objects are built from component unimodal features represented across distributed sensory regions.8 Under this view, when a child thinks about “frog”, the visual cortex represents the appearance of the shape of the frog whereas the auditory cortex represents the croaking sound. Alternatively, other theoretical views predict that multisensory objects are not only built from their component unimodal sensory features, but that there is also a crossmodal integrative code that is different from the sum of these parts.9,10,11,12,13 These latter views propose that anterior temporal lobe structures can act as a polymodal “hub” that combines separate features into integrated wholes.9,11,14,15” – pg. 4

      For this reason, we designed our paradigm to equate the unimodal representations, such that neural differences between the congruent and incongruent conditions provide evidence for a crossmodal integrative code different from the unimodal features (because the unimodal features are equated by default in the design).

      “Critically, our four-day learning task allowed us to isolate any neural activity associated with integrative coding in anterior temporal lobe structures that emerges with experience and differs from the neural patterns recorded at baseline. The learned and non-learned crossmodal objects were constructed from the same set of three validated shape and sound features, ensuring that factors such as familiarity with the unimodal features, subjective similarity, and feature identity were tightly controlled (Figure 2). If the mind represented crossmodal objects entirely as the reactivation of unimodal shapes and sounds (i.e., objects are constructed from their parts), then there should be no difference between the learned and non-learned objects (because they were created from the same three shapes and sounds). By contrast, if the mind represented crossmodal objects as something over and above their component features (i.e., representations for crossmodal objects rely on integrative coding that is different from the sum of their parts), then there should be behavioral and neural differences between learned and non-learned crossmodal objects (because the only difference across the objects is the learned relationship between the parts). Furthermore, this design allowed us to determine the relationship between the object representation acquired after crossmodal learning and the unimodal feature representations acquired before crossmodal learning. That is, we could examine whether learning led to abstraction of the object representations such that it no longer resembled the unimodal feature representations.” – pg. 5

      Furthermore, we agree with the reviewer that our definition and methodological design does not directly capture the structure of the integrative code. With experience, the unimodal feature representations may be completely abstracted away, warped, or changed in a nonlinear transformation. We suggest that crossmodal learning forms an integrative code that is different from the original unimodal representations in the anterior temporal lobes, however, we agree that future work is needed to more directly capture the structure of the integrative code that emerges with experience.

      “In our task, participants had to differentiate congruent and incongruent objects constructed from the same three shape and sound features (Figure 2). An efficient way to solve this task would be to form distinct object-level outputs from the overlapping unimodal feature-level inputs such that congruent objects are made to be orthogonal from the representations before learning (i.e., measured as pattern similarity equal to 0 in the perirhinal cortex; Figure 5b, 6, Supplemental Figure S5), whereas non-learned incongruent objects could be made to be dissimilar from the representations before learning (i.e., anticorrelation, measured as patten similarity less than 0 in the perirhinal cortex; Figure 6). Because our paradigm could decouple neural responses to the learned object representations (on Day 4) from the original component unimodal features at baseline (on Day 2), these results could be taken as evidence of pattern separation in the human perirhinal cortex.11,12 However, our pattern of results could also be explained by other types of crossmodal integrative coding. For example, incongruent object representations may be less stable than congruent object representations, such that incongruent objects representation are warped to a greater extent than congruent objects (Figure 6).” – pg. 18

      “As one solution to the crossmodal binding problem, we suggest that the temporal pole and perirhinal cortex form unique crossmodal object representations that are different from the distributed features in sensory cortex (Figure 4, 5, 6, Supplemental Figure S5). However, the nature by which the integrative code is structured and formed in the temporal pole and perirhinal cortex following crossmodal experience – such as through transformations, warping, or other factors – is an open question and an important area for future investigation.” – pg. 18

      (2) After participants learned the multimodal objects, the authors report a decreased univariate response to congruent visual-auditory objects relative to incongruent objects in the temporal pole. This is claimed to support the existence of an explicit, integrative code for multimodal objects. Given the number of alternative explanations for this finding, this claim seems unwarranted. A simpler interpretation of these results is that the temporal pole is responding to the novelty of the incongruent visual-auditory objects. If there is in fact an explicit, integrative multimodal object representation in the temporal pole, it is unclear why this would manifest in a decreased univariate response.

      We thank the reviewer for identifying this issue. Our behavioural design controls unimodal feature-level novelty but allows object-level novelty to differ. Thus, neural differences between the congruent and incongruent conditions reflects sensitivity to the object-level differences between the combination of shape and sound. However, we agree that there are multiple interpretations regarding the nature of how the integrative code is structured in the temporal pole and perirhinal cortex. We have removed the interpretation highlighted by the reviewer from the results. Instead, we now provide our preferred interpretation in the discussion, while acknowledging the other possibilities that the reviewer mentions.

      As one possibility, these results in temporal pole may reflect “conceptual combination”. “hummingbird” – a congruent pairing – may require less neural resources than an incongruent pairing such as “bark-frog”.

      “Furthermore, these distinct anterior temporal lobe structures may be involved with integrative coding in different ways. For example, the crossmodal object representations measured after learning were found to be related to the component unimodal feature representations measured before learning in the temporal pole but not the perirhinal cortex (Figure 5, 6, Supplemental Figure S5). Moreover, pattern similarity for congruent shape-sound pairs were lower than the pattern similarity for incongruent shape-sound pairs after crossmodal learning in the temporal pole but not the perirhinal cortex (Figure 4b, Supplemental Figure S3a). As one interpretation of this pattern of results, the temporal pole may represent new crossmodal objects by combining previously learned knowledge. 8,9,10,11,13,14,15,33 Specifically, research into conceptual combination has linked the anterior temporal lobes to compound object concepts such as “hummingbird”.34,35,36 For example, participants during our task may have represented the sound-based “humming” concept and visually-based “bird” concept on Day 1, forming the crossmodal “hummingbird” concept on Day 3; Figure 1, 2, which may recruit less activity in temporal pole than an incongruent pairing such as “barking-frog”. For these reasons, the temporal pole may form a crossmodal object code based on pre-existing knowledge, resulting in reduced neural activity (Figure 3d) and pattern similarity towards features associated with learned objects (Figure 4b).”– pg. 18

      (3) The authors ran a neural pattern similarity analysis on the unimodal features before and after multimodal object learning. They found that the similarity between visual and auditory features that composed congruent objects decreased in the temporal pole after multimodal object learning. This was interpreted to reflect an explicit integrative code for multimodal objects, though it is not clear why. First, behavioral data show that participants reported increased similarity between the visual and auditory unimodal features within congruent objects after learning, the opposite of what was found in the temporal pole. Second, it is unclear why an analysis of the unimodal features would be interpreted to reflect the nature of the multimodal object representations. Since the same features corresponded with both congruent and incongruent objects, the nature of the feature representations cannot be interpreted to reflect the nature of the object representations per se. Third, using unimodal feature representations to make claims about object representations seems to contradict the theoretical claim that explicit, integrative object representations are distinct from unimodal features. If the learned multimodal object representation exists separately from the unimodal feature representations, there is no reason why the unimodal features themselves would be influenced by the formation of the object representation. Instead, these results seem to more strongly support the theory that multimodal object learning results in a transformation or warping of feature space.

      We apologize for the lack of clarity. We have now overhauled this aspect of our manuscript in an attempt to better highlight key aspects of our experimental design. In particular, because the unimodal features composing the congruent and incongruent objects were equated, neural differences between these conditions would provide evidence for an experience-dependent crossmodal integrative code that is different from its component unimodal features.

      Related to the second and third points, we were looking at the extent to which the original unimodal representations change with crossmodal learning. Before crossmodal learning, we found that the perirhinal cortex tracked the similarity between the individual visual shape features and the crossmodal objects that were composed of those visual shapes – however, there was no evidence that perirhinal cortex was tracking the unimodal sound features on those crossmodal objects. After crossmodal learning, we see that this visual shape bias in perirhinal cortex was no longer present – that is, the representation in perirhinal cortex started to look less like the visual features that comprise the objects. Thus, crossmodal learning transformed the perirhinal representations so that they were no longer predominantly grounded in a single visual modality, which may be a mechanism by which object concepts gain their abstraction. We have now tried to be clearer about this interpretation throughout the paper.

      Notably, we suggest that experience may change both the crossmodal object representations, as well as the unimodal feature representations. For example, we have previously shown that unimodal visual features are influenced by experience in parallel with the representation of the conjunction (e.g., Liang et al., 2020; Cerebral Cortex). Nevertheless, we remain open to the myriad possible structures of the integrative code that might emerge with experience.

      We now clarify these points throughout the manuscript. For example:

      “We then examined whether the original representations would change after participants learned how the features were paired together to make specific crossmodal objects, conducting the same analysis described above after crossmodal learning had taken place (Figure 5b). With this analysis, we sought to measure the relationship between the representation for the learned crossmodal object and the original baseline representation for the unimodal features. More specifically, the voxel-wise activity for unimodal feature runs before crossmodal learning was correlated to the voxel-wise activity for crossmodal object runs after crossmodal learning (Figure 5b). Another linear mixed model which included modality as a fixed factor within each ROI revealed that the perirhinal cortex was no longer biased towards visual shape after crossmodal learning (F1,32 = 0.12, p = 0.73), whereas the temporal pole, LOC, V1, and A1 remained biased towards either visual shape or sound (F1,30-32 between 16.20 and 73.42, all p < 0.001, η2 between 0.35 and 0.70).” – pg. 14

      “To investigate this effect in perirhinal cortex more specifically, we conducted a linear mixed model to directly compare the change in the visual bias of perirhinal representations from before crossmodal learning to after crossmodal learning (green regions in Figure 5a vs. 5b). Specifically, the linear mixed model included learning day (before vs. after crossmodal learning) and modality (visual feature match to crossmodal object vs. sound feature match to crossmodal object). Results revealed a significant interaction between learning day and modality in the perirhinal cortex (F1,775 = 5.56, p = 0.019, η2 = 0.071), meaning that the baseline visual shape bias observed in perirhinal cortex (green region of Figure 5a) was significantly attenuated with experience (green region of Figure 5b). After crossmodal learning, a given shape no longer invoked significant pattern similarity between objects that had the same shape but differed in terms of what they sounded like. Taken together, these results suggest that prior to learning the crossmodal objects, the perirhinal cortex had a default bias toward representing the visual shape information and was not representing sound information of the crossmodal objects. After crossmodal learning, however, the visual shape bias in perirhinal cortex was no longer present. That is, with crossmodal learning, the representations within perirhinal cortex started to look less like the visual features that comprised the crossmodal objects, providing evidence that the perirhinal representations were no longer predominantly grounded in the visual modality.” – pg. 13

      “Importantly, the initial visual shape bias observed in the perirhinal cortex was attenuated by experience (Figure 5, Supplemental Figure S5), suggesting that the perirhinal representations had become abstracted and were no longer predominantly grounded in a single modality after crossmodal learning. One possibility may be that the perirhinal cortex is by default visually driven as an extension to the ventral visual stream,10,11,12 but can act as a polymodal “hub” region for additional crossmodal input following learning.” – pg. 19

      (4) The most compelling evidence the authors provide for their theoretical claims is the finding that, in the perirhinal cortex, the unimodal feature representations on Day 2 do not correlate with the multimodal objects they comprise on Day 4. This suggests that the learned multimodal object representations are not combinations of their unimodal features. If unimodal features are not decodable within the congruent object representations, this would support the authors' explicit integrative hypothesis. However, the analyses provided do not go all the way in convincing the reader of this claim. First, the analyses reported do not differentiate between congruent and incongruent objects. If this result in the perirhinal cortex reflects the formation of new multimodal object representations, it should only be true for congruent objects but not incongruent objects. Since the analyses combine congruent and incongruent objects it is not possible to know whether this was the case. Second, just because feature representations on Day 2 do not correlate with multimodal object patterns on Day 4 does not mean that the object representations on Day 4 do not contain featural information. This could be directly tested by correlating feature representations on Day 4 with congruent vs. incongruent object representations on Day 4. It could be that representations in the perirhinal cortex are not stable over time and all representations-including unimodal feature representations-shift between sessions, which could explain these results yet not entail the existence of abstracted object representations.

      We thank the reviewer for this suggestion and have conducted the two additional analyses. Specifically, we split the congruent and incongruent conditions and also investigated correlations between unimodal representations on Day 4 with crossmodal object representations on Day 4. There was no significant interaction between modality and congruency in any ROI across or within learning days. One possible explanation for these findings is that both congruent and incongruent crossmodal objects are represented differently from their underlying unimodal features, and all of these representations can transform with experience.

      However, the new analyses also revealed that perirhinal cortex was the only region without a modality-specific bias after crossmodal learning (e.g., Day 4 Unimodal Feature runs x Day 4 Crossmodal Object runs; now shown in Supplemental Figure S5). Overall, these results are consistent with the notion of a crossmodal integrative code in perirhinal cortex that has changed with experience and is different from the component unimodal features. Nevertheless, we explore alternative interpretations for how the crossmodal code emerges with experience in the discussion.

      “To examine whether these results differed by congruency (i.e., whether any modality-specific biases differed as a function of whether the object was congruent or incongruent), we conducted exploratory linear mixed models for each of the five a priori ROIs across learning days. More specifically, we correlated: 1) the voxel-wise activity for Unimodal Feature Runs before crossmodal learning to the voxel-wise activity for Crossmodal Object Runs before crossmodal learning (Day 2 vs. Day 2), 2) the voxel-wise activity for Unimodal Feature Runs before crossmodal learning to the voxel-wise activity for Crossmodal Object Runs after crossmodal learning (Day 2 vs Day 4), and 3) the voxel-wise activity for Unimodal Feature Runs after crossmodal learning to the voxel-wise activity for Crossmodal Object Runs after crossmodal learning (Day 4 vs Day 4). For each of the three analyses described, we then conducted separate linear mixed models which included modality (visual feature match to crossmodal object vs. sound feature match to crossmodal object) and congruency (congruent vs. incongruent)….There was no significant relationship between modality and congruency in any ROI between Day 2 and Day 2 (F1,346-368 between 0.00 and 1.06, p between 0.30 and 0.99), between Day 2 and Day 4 (F1,346-368 between 0.021 and 0.91, p between 0.34 and 0.89), or between Day 4 and Day 4 (F1,346-368 between 0.01 and 3.05, p between 0.082 and 0.93). However, exploratory analyses revealed that perirhinal cortex was the only region without a modality-specific bias and where the unimodal feature runs were not significantly correlated to the crossmodal object runs after crossmodal learning (Supplemental Figure S5).” – pg. 14

      “Taken together, the overall pattern of results suggests that representations of the crossmodal objects in perirhinal cortex were heavily influenced by their consistent visual features before crossmodal learning. However, the crossmodal object representations were no longer influenced by the component visual features after crossmodal learning (Figure 5, Supplemental Figure S5). Additional exploratory analyses did not find evidence of experience-dependent changes in the hippocampus or inferior parietal lobes (Supplemental Figure S4c-e).” – pg. 14

      “The voxel-wise matrix for Unimodal Feature runs on Day 4 were correlated to the voxel-wise matrix for Crossmodal Object runs on Day 4 (see Figure 5 in the main text for an example). We compared the average pattern similarity (z-transformed Pearson correlation) between shape (blue) and sound (orange) features specifically after crossmodal learning. Consistent with Figure 5b, perirhinal cortex was the only region without a modality-specific bias. Furthermore, perirhinal cortex was the only region where the representations of both the visual and sound features were not significantly correlated to the crossmodal objects. By contrast, every other region maintained a modality-specific bias for either the visual or sound features. These results suggest that perirhinal cortex representations were transformed with experience, such that the initial visual shape representations (Figure 5a) were no longer grounded in a single modality after crossmodal learning. Furthermore, these results suggest that crossmodal learning formed an integrative code different from the unimodal features in perirhinal cortex, as the visual and sound features were not significantly correlated with the crossmodal objects. * p < 0.05, ** p < 0.01, *** p < 0.001. Horizontal lines within brain regions indicate a significant main effect of modality. Vertical asterisks denote pattern similarity comparisons relative to 0.” – Supplemental Figure S5

      “We found that the temporal pole and perirhinal cortex – two anterior temporal lobe structures – came to represent new crossmodal object concepts with learning, such that the acquired crossmodal object representations were different from the representation of the constituent unimodal features (Figure 5, 6). Intriguingly, the perirhinal cortex was by default biased towards visual shape, but that this initial visual bias was attenuated with experience (Figure 3c, 5, Supplemental Figure S5). Within the perirhinal cortex, the acquired crossmodal object concepts (measured after crossmodal learning) became less similar to their original component unimodal features (measured at baseline before crossmodal learning); Figure 5, 6, Supplemental Figure S5. This is consistent with the idea that object representations in perirhinal cortex integrate the component sensory features into a whole that is different from the sum of the component parts, which might be a mechanism by which object concepts obtain their abstraction…. As one solution to the crossmodal binding problem, we suggest that the temporal pole and perirhinal cortex form unique crossmodal object representations that are different from the distributed features in sensory cortex (Figure 4, 5, 6, Supplemental Figure S5). However, the nature by which the integrative code is structured and formed in the temporal pole and perirhinal cortex following crossmodal experience – such as through transformations, warping, or other factors – is an open question and an important area for future investigation.” – pg. 18

      In sum, the authors have collected a fantastic dataset that has the potential to answer questions about the formation of multimodal object representations in the brain. A more precise delineation of different theoretical accounts and additional analyses are needed to provide convincing support for the theory that “explicit integrative” multimodal object representations are formed during learning.

      We thank the reviewer for the positive comments and helpful feedback. We hope that our changes to our wording and clarifications to our methodology now more clearly supports the central goal of our study: to find evidence of crossmodal integrative coding different from the original unimodal feature parts in anterior temporal lobe structures. We furthermore agree that future research is needed to delineate the structure of the integrative code that emerges with experience in the anterior temporal lobes.

      Reviewer #3 (Public Review):

      This paper uses behavior and functional brain imaging to understand how neural and cognitive representations of visual and auditory stimuli change as participants learn associations among them. Prior work suggests that areas in the anterior temporal (ATL) and perirhinal cortex play an important role in learning/representing cross-modal associations, but the hypothesis has not been directly tested by evaluating behavior and functional imaging before and after learning cross- modal associations. The results show that such learning changes both the perceived similarities amongst stimuli and the neural responses generated within ATL and perirhinal regions, providing novel support for the view that cross-modal learning leads to a representational change in these regions.

      This work has several strengths. It tackles an important question for current theories of object representation in the mind and brain in a novel and quite direct fashion, by studying how these representations change with cross-modal learning. As the authors note, little work has directly assessed representational change in ATL following such learning, despite the widespread view that ATL is critical for such representation. Indeed, such direct assessment poses several methodological challenges, which the authors have met with an ingenious experimental design. The experiment allows the authors to maintain tight control over both the familiarity and the perceived similarities amongst the shapes and sounds that comprise their stimuli so that the observed changes across sessions must reflect learned cross-modal associations among these. I especially appreciated the creation of physical objects that participants can explore and the approach to learning in which shapes and sounds are initially experienced independently and later in an associated fashion. In using multi-echo MRI to resolve signals in ventral ATL, the authors have minimized a key challenge facing much work in this area (namely the poor SNR yielded by standard acquisition sequences in ventral ATL). The use of both univariate and multivariate techniques was well-motivated and helpful in testing the central questions. The manuscript is, for the most part, clearly written, and nicely connects the current work to important questions in two literatures, specifically (1) the hypothesized role of the perirhinal cortex in representing/learning complex conjunctions of features and (2) the tension between purely embodied approaches to semantic representation vs the view that ATL regions encode important amodal/crossmodal structure.

      There are some places in the manuscript that would benefit from further explanation and methodological detail. I also had some questions about the results themselves and what they signify about the roles of ATL and the perirhinal cortex in object representation.

      We thank the reviewer for their positive feedback and address the comments in the below point-by-point responses.

      (A) I found the terms "features" and "objects" to be confusing as used throughout the manuscript, and sometimes inconsistent. I think by "features" the authors mean the shape and sound stimuli in their experiment. I think by "object" the authors usually mean the conjunction of a shape with a sound---for instance, when a shape and sound are simultaneously experienced in the scanner, or when the participant presses a button on the shape and hears the sound. The confusion comes partly because shapes are often described as being composed of features, not features in and of themselves. (The same is sometimes true of sounds). So when reading "features" I kept thinking the paper referred to the elements that went together to comprise a shape. It also comes from ambiguous use of the word object, which might refer to (a) the 3D- printed item that people play with, which is an object, or (b) a visually-presented shape (for instance, the localizer involved comparing an "object" to a "phase-scrambled" stimulus---here I assume "object" refers to an intact visual stimulus and not the joint presentation of visual and auditory items). I think the design, stimuli, and results would be easier for a naive reader to follow if the authors used the terms "unimodal representation" to refer to cases where only visual or auditory input is presented, and "cross-modal" or "conjoint" representation when both are present.

      We thank the reviewer for this suggestion and agree. We have replaced the terms “features” and “objects” with “unimodal” and “crossmodal” in the title, text, and figures throughout the manuscript for consistency (i.e., “crossmodal binding problem”). To simplify the terminology, we have also removed the localizer results.

      (B) There are a few places where I wasn't sure what exactly was done, and where the methods lacked sufficient detail for another scientist to replicate what was done. Specifically:

      (1) The behavioral study assessing perceptual similarity between visual and auditory stimuli was unclear. The procedure, stimuli, number of trials, etc, should be explained in sufficient detail in methods to allow replication. The results of the study should also minimally be reported in the supplementary information. Without an understanding of how these studies were carried out, it was very difficult to understand the observed pattern of behavioral change. For instance, I initially thought separate behavioral blocks were carried out for visual versus auditory stimuli, each presented in isolation; however, the effects contrast congruent and incongruent stimuli, which suggests these decisions must have been made for the conjoint presentation of both modalities. I'm still not sure how this worked. Additionally, the manuscript makes a brief mention that similarity judgments were made in the context of "all stimuli," but I didn't understand what that meant. Similarity ratings are hugely sensitive to the contrast set with which items appear, so clarity on these points is pretty important. A strength of the design is the contention that shape and sound stimuli were psychophysically matched, so it is important to show the reader how this was done and what the results were.

      We agree and apologize for the lack of sufficient detail in the original manuscript. We now include much more detail about the similarity rating task. The methodology and results of the behavioral rating experiments are now shown in Supplemental Figure S1. In Figure S1a, the similarity ratings are visualized on a multidimensional scaling plot. The triangular geometry for shape (blue) and sound (red) indicate that the subjective similarity was equated within each unimodal feature across individual participants. Quantitatively, there was no difference in similarity between the congruent and incongruent pairings in Figure S1b and Figure S1c prior to crossmodal learning. In addition to providing more information on these methods in the Supplemental Information, we also now provide a more detailed description of the task in the manuscript itself. For convenience, we reproduce these sections below.

      “Pairwise Similarity Task. Using the same task as the stimulus validation procedure (Supplemental Figure S1a), participants provided similarity ratings for all combinations of the 3 validated shapes and 3 validated sounds (each of the six features were rated in the context of every other feature in the set, with 4 repeats of the same feature, for a total of 72 trials). More specifically, three stimuli were displayed on each trial, with one at the top and two at the bottom of the screen in the same procedure as we have used previously27. The 3D shapes were visually displayed as a photo, whereas sounds were displayed on screen in a box that could be played over headphones when clicked with the mouse. The participant made an initial judgment by selecting the more similar stimulus on the bottom relative to the stimulus on the top. Afterwards, the participant made a similarity rating between each bottom stimulus with the top stimulus from 0 being no similarity to 5 being identical. This procedure ensured that ratings were made relative to all other stimuli in the set.”– pg. 28

      “Pairwise similarity task and results. In the initial stimulus validation experiment, participants provided pairwise ratings for 5 sounds and 3 shapes. The shapes were equated in their subjective similarity that had been selected from a well-characterized perceptually uniform stimulus space27 and the pairwise ratings followed the same procedure as described in ref 27. Based on this initial experiment, we then selected the 3 sounds from the that were most closely equated in their subjective similarity. (a) 3D-printed shapes were displayed as images, whereas sounds were displayed in a box that could be played when clicked by the participant. Ratings were averaged to produce a similarity matrix for each participant, and then averaged to produce a group-level similarity matrix. Shown as triangular representational geometries recovered from multidimensional scaling in the above, shapes (blue) and sounds (orange) were approximately equated in their subjective similarity. These features were then used in the four-day crossmodal learning task. (b) Behavioral results from the four-day crossmodal learning task paired with multi-echo fMRI described in the main text. Before crossmodal learning, there was no difference in similarity between shape and sound features associated with congruent objects compared to incongruent objects – indicating that similarity was controlled at the unimodal feature-level. After crossmodal learning, we observed a robust shift in the magnitude of similarity. The shape and sound features associated with congruent objects were now significantly more similar than the same shape and sound features associated with incongruent objects (p < 0.001), evidence that crossmodal learning changed how participants experienced the unimodal features (observed in 17/18 participants). (c) We replicated this learning-related shift in pattern similarity with a larger sample size (n = 44; observed in 38/44 participants). *** denotes p < 0.001. Horizontal lines denote the comparison of congruent vs. incongruent conditions. – Supplemental Figure S1

      (2) The experiences through which participants learned/experienced the shapes and sounds were unclear. The methods mention that they had one minute to explore/palpate each shape and that these experiences were interleaved with other tasks, but it is not clear what the other tasks were, how many such exploration experiences occurred, or how long the total learning time was. The manuscript also mentions that participants learn the shape-sound associations with 100% accuracy but it isn't clear how that was assessed. These details are important partly b/c it seems like very minimal experience to change neural representations in the cortex.

      We apologize for the lack of detail and agree with the reviewer’s suggestions – we now include much more information in the methods section. Each behavioral day required about 1 hour of total time to complete, and indeed, participants rapidly learned their associations with minimal experience. For example:

      “Behavioral Tasks. On each behavioral day (Day 1 and Day 3; Figure 2), participants completed the following tasks, in this order: Exploration Phase, one Unimodal Feature 1-back run (26 trials), Exploration Phase, one Crossmodal 1-back run (26 trials), Exploration Phase, Pairwise Similarity Task (24 trials), Exploration Phase, Pairwise Similarity Task (24 trials), Exploration Phase, Pairwise Similarity Task (24 trials), and finally, Exploration Phase. To verify learning on Day 3, participants also additionally completed a Learning Verification Task at the end of the session. – pg. 27

      “The overall procedure ensured that participants extensively explored the unimodal features on Day 1 and the crossmodal objects on Day 3. The Unimodal Feature and the Crossmodal Object 1-back runs administered on Day 1 and Day 3 served as practice for the neuroimaging sessions on Day 2 and Day 4, during which these 1-back tasks were completed. Each behavioral session required less than 1 hour of total time to complete.” – pg. 27

      “Learning Verification Task (Day 3 only). As the final task on Day 3, participants completed a task to ensure that participants successfully formed their crossmodal pairing. All three shapes and sounds were randomly displayed in 6 boxes on a display. Photos of the 3D shapes were shown, and sounds were played by clicking the box with the mouse cursor. The participant was cued with either a shape or sound, and then selected the corresponding paired feature. At the end of Day 3, we found that all participants reached 100% accuracy on this task (10 trials).” – pg. 29

      (3) I didn't understand the similarity metric used in the multivariate imaging analyses. The manuscript mentions Z-scored Pearson's r, but I didn't know if this meant (a) many Pearson coefficients were computed and these were then Z-scored, so that 0 indicates a value equal to the mean Pearson correlation and 1 is equal to the standard deviation of the correlations, or (b) whether a Fisher Z transform was applied to each r (so that 0 means r was also around 0). From the interpretation of some results, I think the latter is the approach taken, but in general, it would be helpful to see, in Methods or Supplementary information, exactly how similarity scores were computed, and why that approach was adopted. This is particularly important since it is hard to understand the direction of some key effects.

      The reviewer is correct that the Fisher Z transform was applied to each individual r before averaging the correlations. This approach is generally recommended when averaging correlations (see Corey, Dunlap, & Burke, 1998). We are now clearer on this point in the manuscript:

      “The z-transformed Pearson’s correlation coefficient was used as the distance metric for all pattern similarity analyses. More specifically, each individual Pearson correlation was Fisher z-transformed and then averaged (see 61).” – pg. 32

      (C) From Figure 3D, the temporal pole mask appears to exclude the anterior fusiform cortex (or the ventral surface of the ATL generally). If so, this is a shame, since that appears to be the locus most important to cross-modal integration in the "hub and spokes" model of semantic representation in the brain. The observation in the paper that the perirhinal cortex seems initially biased toward visual structure while more superior ATL is biased toward auditory structure appears generally consistent with the "graded hub" view expressed, for instance, in our group's 2017 review paper (Lambon Ralph et al., Nature Reviews Neuroscience). The balance of visual- versus auditory-sensitivity in that work appears balanced in the anterior fusiform, just a little lateral to the anterior perirhinal cortex. It would be helpful to know if the same pattern is observed for this area specifically in the current dataset.

      We thank the reviewer for this suggestion. After close inspection of Lambon Ralph et al. (2017), we believe that our perirhinal cortex mask appears to be overlapping with the ventral ATL/anterior fusiform region that the reviewer mentions. See Author response image 1 for a visual comparison:

      Author response image 1.

      The top four figures are sampled from Lambon Ralph et al (2017), whereas the bottom two figures visualize our perirhinal cortex mask (white) and temporal pole mask (dark green) relative to the fusiform cortex. The ROIs visualized were defined from the Harvard-Oxford atlas.

      We now mention this area of overlap in our manuscript and link it to the hub and spokes model:

      “Notably, our perirhinal cortex mask overlaps with a key region of the ventral anterior temporal lobe thought to be the central locus of crossmodal integration in the “hub and spokes” model of semantic representations.9,50 – pg. 20

      (D) While most effects seem robust from the information presented, I'm not so sure about the analysis of the perirhinal cortex shown in Figure 5. This compares (I think) the neural similarity evoked by a unimodal stimulus ("feature") to that evoked by the same stimulus when paired with its congruent stimulus in the other modality ("object"). These similarities show an interaction with modality prior to cross-modal association, but no interaction afterward, leading the authors to suggest that the perirhinal cortex has become less biased toward visual structure following learning. But the plots in Figures 4a and b are shown against different scales on the y-axes, obscuring the fact that all of the similarities are smaller in the after-learning comparison. Since the perirhinal interaction was already the smallest effect in the pre-learning analysis, it isn't really surprising that it drops below significance when all the effects diminish in the second comparison. A more rigorous test would assess the reliability of the interaction of comparison (pre- or post-learning) with modality. The possibility that perirhinal representations become less "visual" following cross-modal learning is potentially important so a post hoc contrast of that kind would be helpful.

      We apologize for the lack of clarity. We conducted a linear mixed model to assess the interaction between modality and crossmodal learning day (before and after crossmodal learning) in the perirhinal cortex as described by the reviewer. The critical interaction was significant, which is now clarified in the text as well as in the rescaled figure plots.

      “To investigate this effect in perirhinal cortex more specifically, we conducted a linear mixed model to directly compare the change in the visual bias of perirhinal representations from before crossmodal learning to after crossmodal learning (green regions in Figure 5a vs. 5b). Specifically, the linear mixed model included learning day (before vs. after crossmodal learning) and modality (visual feature match to crossmodal object vs. sound feature match to crossmodal object). Results revealed a significant interaction between learning day and modality in the perirhinal cortex (F1,775 = 5.56, p = 0.019, η2 = 0.071), meaning that the baseline visual shape bias observed in perirhinal cortex (green region of Figure 5a) was significantly attenuated with experience (green region of Figure 5b). After crossmodal learning, a given shape no longer invoked significant pattern similarity between objects that had the same shape but differed in terms of what they sounded like. Taken together, these results suggest that prior to learning the crossmodal objects, the perirhinal cortex had a default bias toward representing the visual shape information and was not representing sound information of the crossmodal objects. After crossmodal learning, however, the visual shape bias in perirhinal cortex was no longer present. That is, with crossmodal learning, the representations within perirhinal cortex started to look less like the visual features that comprised the crossmodal objects, providing evidence that the perirhinal representations were no longer predominantly grounded in the visual modality.” – pg. 13

      We note that not all effects drop in Figure 5b (even in regions with a similar numerical pattern similarity to PRC, like the hippocampus – also see Supplemental Figure S5 for a comparison for patterns only on Day 4), suggesting that the change in visual bias in PRC is not simply due to noise.

      “Importantly, the change in pattern similarity in the perirhinal cortex across learning days (Figure 5) is unlikely to be driven by noise, poor alignment of patterns across sessions, or generally reduced responses. Other regions with numerically similar pattern similarity to perirhinal cortex did not change across learning days (e.g., visual features x crossmodal objects in A1 in Figure 5; the exploratory ROI hippocampus with numerically similar pattern similarity to perirhinal cortex also did not change in Supplemental Figure S4c-d).” – pg. 14

      (E) Is there a reason the authors did not look at representation and change in the hippocampus? As a rapid-learning, widely-connected feature-binding mechanism, and given the fairly minimal amount of learning experience, it seems like the hippocampus would be a key area of potential import for the cross-modal association. It also looks as though the hippocampus is implicated in the localizer scan (Figure 3c).

      We thank the reviewer for this suggestion and now include additional analyses for the hippocampus. We found no evidence of crossmodal integrative coding different from the unimodal features. Rather, the hippocampus seems to represent the convergence of unimodal features, as evidenced by …[can you give some pithy description for what is meant by “convergence” vs “integration”?]. We provide these results in the Supplemental Information and describe them in the main text:

      “Analyses for the hippocampus (HPC) and inferior parietal lobe (IPL). (a) In the visual vs. auditory univariate analysis, there was no visual or sound bias in HPC, but there was a bias towards sounds that increased numerically after crossmodal learning in the IPL. (b) Pattern similarity analyses between unimodal features associated with congruent objects and incongruent objects. Similar to Supplemental Figure S3, there was no main effect of congruency in either region. (c) When we looked at the pattern similarity between Unimodal Feature runs on Day 2 to Crossmodal Object runs on Day 2, we found that there was significant pattern similarity when there was a match between the unimodal feature and the crossmodal object (e.g., pattern similarity > 0). This pattern of results held when (d) correlating the Unimodal Feature runs on Day 2 to Crossmodal Object runs on Day 4, and (e) correlating the Unimodal Feature runs on Day 4 to Crossmodal Object runs on Day 4. Finally, (f) there was no significant pattern similarity between Crossmodal Object runs before learning correlated to Crossmodal Object after learning in HPC, but there was significant pattern similarity in IPL (p < 0.001). Taken together, these results suggest that both HPC and IPL are sensitive to visual and sound content, as the (c, d, e) unimodal feature-level representations were correlated to the crossmodal object representations irrespective of learning day. However, there was no difference between congruent and incongruent pairings in any analysis, suggesting that HPC and IPL did not represent crossmodal objects differently from the component unimodal features. For these reasons, HPC and IPL may represent the convergence of unimodal feature representations (i.e., because HPC and IPL were sensitive to both visual and sound features), but our results do not seem to support these regions in forming crossmodal integrative coding distinct from the unimodal features (i.e., because representations in HPC and IPL did not differentiate the congruent and incongruent conditions and did not change with experience). * p < 0.05, ** p < 0.01, *** p < 0.001. Asterisks above or below bars indicate a significant difference from zero. Horizontal lines within brain regions in (a) reflect an interaction between modality and learning day, whereas horizontal lines within brain regions in reflect main effects of (b) learning day, (c-e) modality, or (f) congruency.” – Supplemental Figure S4.

      “Notably, our perirhinal cortex mask overlaps with a key region of the ventral anterior temporal lobe thought to be the central locus of crossmodal integration in the “hub and spokes” model of semantic representations.9,50 However, additional work has also linked other brain regions to the convergence of unimodal representations, such as the hippocampus51,52,53 and inferior parietal lobes.54,55 This past work on the hippocampus and inferior parietal lobe does not necessarily address the crossmodal binding problem that was the main focus of our present study, as previous findings often do not differentiate between crossmodal integrative coding and the convergence of unimodal feature representations per se. Furthermore, previous studies in the literature typically do not control for stimulus-based factors such as experience with unimodal features, subjective similarity, or feature identity that may complicate the interpretation of results when determining regions important for crossmodal integration. Indeed, we found evidence consistent with the convergence of unimodal feature-based representations in both the hippocampus and inferior parietal lobes (Supplemental Figure S4), but no evidence of crossmodal integrative coding different from the unimodal features. The hippocampus and inferior parietal lobes were both sensitive to visual and sound features before and after crossmodal learning (see Supplemental Figure S4c-e). Yet the hippocampus and inferior parietal lobes did not differentiate between the congruent and incongruent conditions or change with experience (see Supplemental Figure S4).” – pg. 20

      (F) The direction of the neural effects was difficult to track and understand. I think the key observation is that TP and PRh both show changes related to cross-modal congruency - but still it would be helpful if the authors could articulate, perhaps via a schematic illustration, how they think representations in each key area are changing with the cross-modal association. Why does the temporal pole come to activate less for congruent than incongruent stimuli (Figure 3)? And why do TP responses grow less similar to one another for congruent relative to incongruent stimuli after learning (Figure 4)? Why are incongruent stimulus similarities anticorrelated in their perirhinal responses following cross-modal learning (Figure 6)?

      We thank the author for identifying this issue, which was also raised by the other reviewers. The reviewer is correct that the key observation is that the TP and PRC both show changes related to crossmodal congruency (given that the unimodal features were equated in the methodological design). However, the structure of the integrative code is less clear, which we now emphasize in the main text. Our findings provide evidence of a crossmodal integrative code that is different from the unimodal features, and future studies are needed to better understand the structure of how such a code might emerge. We now more clearly highlight this distinction throughout the paper:

      “By contrast, perirhinal cortex may be involved in pattern separation following crossmodal experience. In our task, participants had to differentiate congruent and incongruent objects constructed from the same three shape and sound features (Figure 2). An efficient way to solve this task would be to form distinct object-level outputs from the overlapping unimodal feature-level inputs such that congruent objects are made to be orthogonal from the representations before learning (i.e., measured as pattern similarity equal to 0 in the perirhinal cortex; Figure 5b, 6, Supplemental Figure S5), whereas non-learned incongruent objects could be made to be dissimilar from the representations before learning (i.e., anticorrelation, measured as patten similarity less than 0 in the perirhinal cortex; Figure 6). Because our paradigm could decouple neural responses to the learned object representations (on Day 4) from the original component unimodal features at baseline (on Day 2), these results could be taken as evidence of pattern separation in the human perirhinal cortex.11,12 However, our pattern of results could also be explained by other types of crossmodal integrative coding. For example, incongruent object representations may be less stable than congruent object representations, such that incongruent objects representation are warped to a greater extent than congruent objects (Figure 6).” – pg. 18

      “As one solution to the crossmodal binding problem, we suggest that the temporal pole and perirhinal cortex form unique crossmodal object representations that are different from the distributed features in sensory cortex (Figure 4, 5, 6, Supplemental Figure S5). However, the nature by which the integrative code is structured and formed in the temporal pole and perirhinal cortex following crossmodal experience – such as through transformations, warping, or other factors – is an open question and an important area for future investigation. Furthermore, these anterior temporal lobe structures may be involved with integrative coding in different ways. For example, the crossmodal object representations measured after learning were found to be related to the component unimodal feature representations measured before learning in the temporal pole but not the perirhinal cortex (Figure 5, 6, Supplemental Figure S5). Moreover, pattern similarity for congruent shape-sound pairs were lower than the pattern similarity for incongruent shape-sound pairs after crossmodal learning in the temporal pole but not the perirhinal cortex (Figure 4b, Supplemental Figure S3a). As one interpretation of this pattern of results, the temporal pole may represent new crossmodal objects by combining previously learned knowledge. 8,9,10,11,13,14,15,33 Specifically, research into conceptual combination has linked the anterior temporal lobes to compound object concepts such as “hummingbird”.34,35,36 For example, participants during our task may have represented the sound-based “humming” concept and visually-based “bird” concept on Day 1, forming the crossmodal “hummingbird” concept on Day 3; Figure 1, 2, which may recruit less activity in temporal pole than an incongruent pairing such as “barking-frog”. For these reasons, the temporal pole may form a crossmodal object code based on pre-existing knowledge, resulting in reduced neural activity (Figure 3d) and pattern similarity towards features associated with learned objects (Figure 4b).” – pg. 18

      This work represents a key step in our advancing understanding of object representations in the brain. The experimental design provides a useful template for studying neural change related to the cross-modal association that may prove useful to others in the field. Given the broad variety of open questions and potential alternative analyses, an open dataset from this study would also likely be a considerable contribution to the field.

    2. eLife assessment

      The fMRI study is important because it investigates fundamental questions about the neural basis of multimodal binding using an innovative multi-day learning approach. The results provide solid evidence for learning-related changes in the anterior temporal lobe, however, the interpretation of these changes is not straightforward, and the study does not (yet) provide direct evidence for an integrative code. This paper is of potential interest to a broad audience of neuroscientists.

    3. Reviewer #1 (Public Review):

      This study used a multi-day learning paradigm combined with fMRI to reveal neural changes reflecting the learning of new (arbitrary) shape-sound associations. In the scanner, the shapes and sounds are presented separately and together, both before and after learning. When they are presented together, they can be either consistent or inconsistent with the learned associations. The analyses focus on auditory and visual cortices, as well as the object-selective cortex (LOC) and anterior temporal lobe regions (temporal pole (TP) and perirhinal cortex (PRC)). Results revealed several learning-induced changes, particularly in the anterior temporal lobe regions. First, the LOC and PRC showed a reduced bias to shapes vs sounds (presented separately) after learning. Second, the TP responded more strongly to incongruent than congruent shape-sound pairs after learning. Third, the similarity of TP activity patterns to sounds and shapes (presented separately) was increased for non-matching shape-sound comparisons after learning. Fourth, when comparing the pattern similarity of individual features to combined shape-sound stimuli, the PRC showed a reduced bias towards visual features after learning. Finally, comparing patterns to combined shape-sound stimuli before and after learning revealed a reduced (and negative) similarity for incongruent combinations in PRC. These results are all interpreted as evidence for an explicit integrative code of newly learned multimodal objects, in which the whole is different from the sum of the parts.

      The study has many strengths. It addresses a fundamental question that is of broad interest, the learning paradigm is well-designed and controlled, and the stimuli are real 3D stimuli that participants interact with. The manuscript is well written and the figures are very informative, clearly illustrating the analyses performed.

      There are also some weaknesses. The sample size (N=17) is small for detecting the subtle effects of learning. Most of the statistical analyses are not corrected for multiple comparisons (ROIs), and the specificity of the key results to specific regions is also not tested. Furthermore, the evidence for an integrative representation is rather indirect, and alternative interpretations for these results are not considered.

    4. Reviewer #2 (Public Review):

      Li et al. used a four-day fMRI design to investigate how unimodal feature information is combined, integrated, or abstracted to form a multimodal object representation. The experimental question is of great interest and understanding how the human brain combines featural information to form complex representations is relevant for a wide range of researchers in neuroscience, cognitive science, and AI. While most fMRI research on object representations is limited to visual information, the authors examined how visual and auditory information is integrated to form a multimodal object representation. The experimental design is elegant and clever. Three visual shapes and three auditory sounds were used as the unimodal features; the visual shapes were used to create 3D-printed objects. On Day 1, the participants interacted with the 3D objects to learn the visual features, but the objects were not paired with the auditory features, which were played separately. On Day 2, participants were scanned with fMRI while they were exposed to the unimodal visual and auditory features as well as pairs of visual-auditory cues. On Day 3, participants again interacted with the 3D objects but now each was paired with one of the three sounds that played from an internal speaker. On Day 4, participants completed the same fMRI scanning runs they completed on Day 2, except now some visual-auditory feature pairs corresponded with Congruent (learned) objects, and some with Incongruent (unlearned) objects. Using the same fMRI design on Days 2 and 4 enables a well-controlled comparison between feature- and object-evoked neural representations before and after learning. The notable results corresponded to findings in the perirhinal cortex and temporal pole. The authors report (1) that a visual bias on Day 2 for unimodal features in the perirhinal cortex was attenuated after learning on Day 4, (2) a decreased univariate response to congruent vs. incongruent visual-auditory objects in the temporal pole on Day 4, (3) decreased pattern similarity between congruent vs. incongruent pairs of visual and auditory unimodal features in the temporal pole on Day 4, (4) in the perirhinal cortex, visual unimodal features on Day 2 do not correlate with their respective visual-auditory objects on Day 4, and (5) in the perirhinal cortex, multimodal object representations across Days 2 and 4 are uncorrelated for congruent objects and anticorrelated for incongruent. The authors claim that each of these results supports the theory that multimodal objects are represented in an "explicit integrative" code separate from feature representations. While these data are valuable and the results are interesting, the authors' claims are not well supported by their findings.

      (1) In the introduction, the authors contrast two theories: (a) multimodal objects are represented in the co-activation of unimodal features, and (b) multimodal objects are represented in an explicit integrative code such that the whole is different than the sum of its parts. However, the distinction between these two theories is not straightforward. An explanation of what is precisely meant by "explicit" and "integrative" would clarify the authors' theoretical stance. Perhaps we can assume that an "explicit" representation is a new representation that is created to represent a multimodal object. What is meant by "integrative" is more ambiguous-unimodal features could be integrated within a representation in a manner that preserves the decodability of the unimodal features, or alternatively the multimodal representation could be completely abstracted away from the constituent features such that the features are no longer decodable. Even if the object representation is "explicit" and distinct from the unimodal feature representations, it can in theory still contain featural information, though perhaps warped or transformed. The authors do not clearly commit to a degree of featural abstraction in their theory of "explicit integrative" multimodal object representations which makes it difficult to assess the validity of their claims.

      (2) After participants learned the multimodal objects, the authors report a decreased univariate response to congruent visual-auditory objects relative to incongruent objects in the temporal pole. This is claimed to support the existence of an explicit, integrative code for multimodal objects. Given the number of alternative explanations for this finding, this claim seems unwarranted. A simpler interpretation of these results is that the temporal pole is responding to the novelty of the incongruent visual-auditory objects. If there is in fact an explicit, integrative multimodal object representation in the temporal pole, it is unclear why this would manifest in a decreased univariate response.

      (3) The authors ran a neural pattern similarity analysis on the unimodal features before and after multimodal object learning. They found that the similarity between visual and auditory features that composed congruent objects decreased in the temporal pole after multimodal object learning. This was interpreted to reflect an explicit integrative code for multimodal objects, though it is not clear why. First, behavioral data show that participants reported increased similarity between the visual and auditory unimodal features within congruent objects after learning, the opposite of what was found in the temporal pole. Second, it is unclear why an analysis of the unimodal features would be interpreted to reflect the nature of the multimodal object representations. Since the same features corresponded with both congruent and incongruent objects, the nature of the feature representations cannot be interpreted to reflect the nature of the object representations per se. Third, using unimodal feature representations to make claims about object representations seems to contradict the theoretical claim that explicit, integrative object representations are distinct from unimodal features. If the learned multimodal object representation exists separately from the unimodal feature representations, there is no reason why the unimodal features themselves would be influenced by the formation of the object representation. Instead, these results seem to more strongly support the theory that multimodal object learning results in a transformation or warping of feature space.

      (4) The most compelling evidence the authors provide for their theoretical claims is the finding that, in the perirhinal cortex, the unimodal feature representations on Day 2 do not correlate with the multimodal objects they comprise on Day 4. This suggests that the learned multimodal object representations are not combinations of their unimodal features. If unimodal features are not decodable within the congruent object representations, this would support the authors' explicit integrative hypothesis. However, the analyses provided do not go all the way in convincing the reader of this claim. First, the analyses reported do not differentiate between congruent and incongruent objects. If this result in the perirhinal cortex reflects the formation of new multimodal object representations, it should only be true for congruent objects but not incongruent objects. Since the analyses combine congruent and incongruent objects it is not possible to know whether this was the case. Second, just because feature representations on Day 2 do not correlate with multimodal object patterns on Day 4 does not mean that the object representations on Day 4 do not contain featural information. This could be directly tested by correlating feature representations on Day 4 with congruent vs. incongruent object representations on Day 4. It could be that representations in the perirhinal cortex are not stable over time and all representations-including unimodal feature representations-shift between sessions, which could explain these results yet not entail the existence of abstracted object representations.

      In sum, the authors have collected a fantastic dataset that has the potential to answer questions about the formation of multimodal object representations in the brain. A more precise delineation of different theoretical accounts and additional analyses are needed to provide convincing support for the theory that "explicit integrative" multimodal object representations are formed during learning.

    5. Reviewer #3 (Public Review):

      This paper uses behavior and functional brain imaging to understand how neural and cognitive representations of visual and auditory stimuli change as participants learn associations among them. Prior work suggests that areas in the anterior temporal (ATL) and perirhinal cortex play an important role in learning/representing cross-modal associations, but the hypothesis has not been directly tested by evaluating behavior and functional imaging before and after learning cross-modal associations. The results show that such learning changes both the perceived similarities amongst stimuli and the neural responses generated within ATL and perirhinal regions, providing novel support for the view that cross-modal learning leads to a representational change in these regions.

      This work has several strengths. It tackles an important question for current theories of object representation in the mind and brain in a novel and quite direct fashion, by studying how these representations change with cross-modal learning. As the authors note, little work has directly assessed representational change in ATL following such learning, despite the widespread view that ATL is critical for such representation. Indeed, such direct assessment poses several methodological challenges, which the authors have met with an ingenious experimental design. The experiment allows the authors to maintain tight control over both the familiarity and the perceived similarities amongst the shapes and sounds that comprise their stimuli so that the observed changes across sessions must reflect learned cross-modal associations among these. I especially appreciated the creation of physical objects that participants can explore and the approach to learning in which shapes and sounds are initially experienced independently and later in an associated fashion. In using multi-echo MRI to resolve signals in ventral ATL, the authors have minimized a key challenge facing much work in this area (namely the poor SNR yielded by standard acquisition sequences in ventral ATL). The use of both univariate and multivariate techniques was well-motivated and helpful in testing the central questions. The manuscript is, for the most part, clearly written, and nicely connects the current work to important questions in two literatures, specifically (1) the hypothesized role of the perirhinal cortex in representing/learning complex conjunctions of features and (2) the tension between purely embodied approaches to semantic representation vs the view that ATL regions encode important amodal/crossmodal structure.

      There are some places in the manuscript that would benefit from further explanation and methodological detail. I also had some questions about the results themselves and what they signify about the roles of ATL and the perirhinal cortex in object representation.

      A) I found the terms "features" and "objects" to be confusing as used throughout the manuscript, and sometimes inconsistent. I think by "features" the authors mean the shape and sound stimuli in their experiment. I think by "object" the authors usually mean the conjunction of a shape with a sound---for instance, when a shape and sound are simultaneously experienced in the scanner, or when the participant presses a button on the shape and hears the sound. The confusion comes partly because shapes are often described as being composed of features, not features in and of themselves. (The same is sometimes true of sounds). So when reading "features" I kept thinking the paper referred to the elements that went together to comprise a shape. It also comes from ambiguous use of the word object, which might refer to (a) the 3D-printed item that people play with, which is an object, or (b) a visually-presented shape (for instance, the localizer involved comparing an "object" to a "phase-scrambled" stimulus---here I assume "object" refers to an intact visual stimulus and not the joint presentation of visual and auditory items). I think the design, stimuli, and results would be easier for a naive reader to follow if the authors used the terms "unimodal representation" to refer to cases where only visual or auditory input is presented, and "cross-modal" or "conjoint" representation when both are present.

      B) There are a few places where I wasn't sure what exactly was done, and where the methods lacked sufficient detail for another scientist to replicate what was done. Specifically:

      (1) The behavioral study assessing perceptual similarity between visual and auditory stimuli was unclear. The procedure, stimuli, number of trials, etc, should be explained in sufficient detail in methods to allow replication. The results of the study should also minimally be reported in the supplementary information. Without an understanding of how these studies were carried out, it was very difficult to understand the observed pattern of behavioral change. For instance, I initially thought separate behavioral blocks were carried out for visual versus auditory stimuli, each presented in isolation; however, the effects contrast congruent and incongruent stimuli, which suggests these decisions must have been made for the conjoint presentation of both modalities. I'm still not sure how this worked. Additionally, the manuscript makes a brief mention that similarity judgments were made in the context of "all stimuli," but I didn't understand what that meant. Similarity ratings are hugely sensitive to the contrast set with which items appear, so clarity on these points is pretty important. A strength of the design is the contention that shape and sound stimuli were psychophysically matched, so it is important to show the reader how this was done and what the results were.

      (2) The experiences through which participants learned/experienced the shapes and sounds were unclear. The methods mention that they had one minute to explore/palpate each shape and that these experiences were interleaved with other tasks, but it is not clear what the other tasks were, how many such exploration experiences occurred, or how long the total learning time was. The manuscript also mentions that participants learn the shape-sound associations with 100% accuracy but it isn't clear how that was assessed. These details are important partly b/c it seems like very minimal experience to change neural representations in the cortex.

      (3) I didn't understand the similarity metric used in the multivariate imaging analyses. The manuscript mentions Z-scored Pearson's r, but I didn't know if this meant (a) many Pearson coefficients were computed and these were then Z-scored, so that 0 indicates a value equal to the mean Pearson correlation and 1 is equal to the standard deviation of the correlations, or (b) whether a Fisher Z transform was applied to each r (so that 0 means r was also around 0). From the interpretation of some results, I think the latter is the approach taken, but in general, it would be helpful to see, in Methods or Supplementary information, exactly how similarity scores were computed, and why that approach was adopted. This is particularly important since it is hard to understand the direction of some key effects.

      C) From Figure 3D, the temporal pole mask appears to exclude the anterior fusiform cortex (or the ventral surface of the ATL generally). If so, this is a shame, since that appears to be the locus most important to cross-modal integration in the "hub and spokes" model of semantic representation in the brain. The observation in the paper that the perirhinal cortex seems initially biased toward visual structure while more superior ATL is biased toward auditory structure appears generally consistent with the "graded hub" view expressed, for instance, in our group's 2017 review paper (Lambon Ralph et al., Nature Reviews Neuroscience). The balance of visual- versus auditory-sensitivity in that work appears balanced in the anterior fusiform, just a little lateral to the anterior perirhinal cortex. It would be helpful to know if the same pattern is observed for this area specifically in the current dataset.

      D) While most effects seem robust from the information presented, I'm not so sure about the analysis of the perirhinal cortex shown in Figure 5. This compares (I think) the neural similarity evoked by a unimodal stimulus ("feature") to that evoked by the same stimulus when paired with its congruent stimulus in the other modality ("object"). These similarities show an interaction with modality prior to cross-modal association, but no interaction afterward, leading the authors to suggest that the perirhinal cortex has become less biased toward visual structure following learning. But the plots in Figures 4a and b are shown against different scales on the y-axes, obscuring the fact that all of the similarities are smaller in the after-learning comparison. Since the perirhinal interaction was already the smallest effect in the pre-learning analysis, it isn't really surprising that it drops below significance when all the effects diminish in the second comparison. A more rigorous test would assess the reliability of the interaction of comparison (pre- or post-learning) with modality. The possibility that perirhinal representations become less "visual" following cross-modal learning is potentially important so a post hoc contrast of that kind would be helpful.

      E) Is there a reason the authors did not look at representation and change in the hippocampus? As a rapid-learning, widely-connected feature-binding mechanism, and given the fairly minimal amount of learning experience, it seems like the hippocampus would be a key area of potential import for the cross-modal association. It also looks as though the hippocampus is implicated in the localizer scan (Figure 3c).

      F) The direction of the neural effects was difficult to track and understand. I think the key observation is that TP and PRh both show changes related to cross-modal congruency - but still it would be helpful if the authors could articulate, perhaps via a schematic illustration, how they think representations in each key area are changing with the cross-modal association. Why does the temporal pole come to activate *less* for congruent than incongruent stimuli (Figure 3)? And why do TP responses grow less similar to one another for congruent relative to incongruent stimuli after learning (Figure 4)? Why are incongruent stimulus similarities *anticorrelated* in their perirhinal responses following cross-modal learning (Figure 6)?

      This work represents a key step in our advancing understanding of object representations in the brain. The experimental design provides a useful template for studying neural change related to the cross-modal association that may prove useful to others in the field. Given the broad variety of open questions and potential alternative analyses, an open dataset from this study would also likely be a considerable contribution to the field.